vendredi 2 mars 2018

Selecting rows based on multiple conditions

I have a df

set.seed(123)
df <- data.frame(loc.id = rep(1:9, each = 9), month = rep(1:9,times = 9), 
                 x = runif(81, min = 0, max = 5))

This is a dataframe which has 9 locations. For each location, I have 9 months and for each month, there is a value of x.

For each location, I want to select a month based on following criteria:

1) Check which months (excluding month 9) have x > 1 and then select that month which is closest to month 9. For example, if for location 1, values of x is

  4.56, 3.41, 0.82, 2.31, 3.75, 4.75, 1.22, 2.98, 1.17

then month 1,2,4,5,6,7,8 have x > 1 and from these months, month 8 is closest to month 9. So month 8 will be selected

2) If none of the months have x > 1, simply select that month which has the highest x value. For example:

If for a location, x is

  0.8, 0.6, 0.95, 0.4, 0.88, 0.7, 0.6, 0.45, 0.3

then month 3 will be selected (x = 0.95)

I tried this:

  library(dplyr)
  df %>% filter(month != 9) %>% # removes the 9 month so that only the 8 months are evaluated 
        group_by(loc.id) %>% 
        mutate(select.month = x > 1) %>% # mark those months where x > 1
        filter(select.month == TRUE) %>% # select those months where x > 1 is true
        mutate(dif = 9 - month) %>%# subtract each month from 9 to check which one is closest to 9
        summarise(month.id = min(dif)) # select the months which is closest to month 9

However, in the above function I cannot check for those locations where all the month's have value less than 1. My question is how do I change the above code to also check condition 2 when none of the x is > 1

Aucun commentaire:

Enregistrer un commentaire