lundi 14 décembre 2020

How to select a group from a column based on values from another column using conditional statements

I have a sample dataset as follows:

    df <- data.frame(column_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), RequestDate = c(2019-07-01, 2019-07-02, 2019-07-03, 2019-07-04,2019-07-05,2019-07-06,2019-07-07,2019-07-08,2019-07-09,2019-07-10, 2019-07-02, 2019-07-03, 2019-07-04,2019-07-05,2019-07-06,2019-07-07,2019-07-08,2019-07-09,2019-07-10, 2019-07-11, 2019-07-03, 2019-07-04,2019-07-05,2019-07-06,2019-07-07,2019-07-08,2019-07-09,2019-07-10, 2019-07-11, 2019-07-12, 2019-07-04,2019-07-05,2019-07-06,2019-07-07,2019-07-08,2019-07-09,2019-07-10, 2019-07-11, 2019-07-12,2019-07-13, 2019-07-05,2019-07-06,2019-07-07,2019-07-08,2019-07-09,2019-07-10, 2019-07-11, 2019-07-12,2019-07-13,2019-07-14), P95_Stats = c(7, 20, 54, 30, 12, 50, 22, 65, 70, 45, 13, 34, 54, 80, 60, 43, 57, 90, 11, 165, 45, 76, 91, 81, 66, 39, 49, 47, 63, 87, 22, 45, 15, 22, 96, 99, 37, 84, 55, 77, 19, 42, 101, 93, 67, 82, 10, 26, 44, 50)) 

So here each group is a 10-day period. 
Group 1 = Day 1-Day 10 (07/01- 07/10)
Group 2 = Day 2-Day 11 (07/02 - 07/11)
Group 3 = Day 3-Day 12 (07/03 - 07/12)
Group 4 = Day 4-Day 13 (07/04 - 07/13)
Group 5 = Day 5-Day 15 (07/05 - 07/15)

I want to be able to pick a candidate 10 day period using if else statement as follows: 

If 1 or more 10=day periods have the max day, hour, and 3-hour:
   Pick the 10-day period with max number of 1-hour and 3-hour periods above the 95P value

Else If 1 or more 10=day periods have the max hour, and 3-hour:
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value

Else If 1 or more 10=day periods have the max day, hour,
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value

Else If 1 or more 10=day periods have the max day, 3-hour,
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value

Else 1 or more 10=day periods have the max 1-hour,
   Pick the one with max number of 1-hour and 3-hour periods above the 95P valueenter code here

If 1 or more 10=day periods have the max day, hour, and 3-hour:
   Pick the 10-day period with max number of 1-hour and 3-hour periods above the 95P value

Else If 1 or more 10=day periods have the max hour, and 3-hour:
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value

Else If 1 or more 10=day periods have the max day, hour,
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value

Else If 1 or more 10=day periods have the max day, 3-hour,
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value

Else 1 or more 10=day periods have the max 1-hour,
   Pick the one with max number of 1-hour and 3-hour periods above the 95P value 


I already have stuff in my code that calculates the max_day, max_hour_day, max_3hr_day, max_hr_p95, and max_3hr_pp5. Lets say for this example:

    max_day = 2019-07-13 (coming from RequestDate column)
    max_hour_day = 2019-07-05 (coming from RequestDate column)
    max_3hr_day = 2019-07-11 (coming from RequestDate column)
    max_hr_p95 = 60 (coming from P95_Stats column)
    max_3hr_p95 = 44 (coming from P95_Stats column)

Now the problem is that I don't know how to extract the periods that contain these variables defined above. I want to find the best 10-day period

If someone can help me to make the conditional statements, it would be really helpful

Aucun commentaire:

Enregistrer un commentaire