I have a df
set.seed(123)
df <- data.frame(loc.id = rep(1:9, each = 9), month = rep(1:9,times = 9),
x = runif(81, min = 0, max = 5))
This is a dataframe which has 9 locations. For each location, I have 9 months and for each month, there is a value of x.
For each location, I want to select a month based on following criteria:
1) Check which months (excluding month 9) have x > 1 and then select that month which is closest to month 9. For example, if for location 1, values of x is
4.56, 3.41, 0.82, 2.31, 3.75, 4.75, 1.22, 2.98, 1.17
then month 1,2,4,5,6,7,8 have x > 1 and from these months, month 8 is closest to month 9. So month 8 will be selected
2) If none of the months have x > 1, simply select that month which has the highest x value. For example:
If for a location, x is
0.8, 0.6, 0.95, 0.4, 0.88, 0.7, 0.6, 0.45, 0.3
then month 3 will be selected (x = 0.95)
I tried this:
library(dplyr)
df %>% filter(month != 9) %>% # removes the 9 month so that only the 8 months are evaluated
group_by(loc.id) %>%
mutate(select.month = x > 1) %>% # mark those months where x > 1
filter(select.month == TRUE) %>% # select those months where x > 1 is true
mutate(dif = 9 - month) %>%# subtract each month from 9 to check which one is closest to 9
summarise(month.id = min(dif)) # select the months which is closest to month 9
However, in the above function I cannot check for those locations where all the month's have value less than 1. My question is how do I change the above code to also check condition 2 when none of the x is > 1
Aucun commentaire:
Enregistrer un commentaire