lundi 1 février 2021

ifelse with two conditions numeric and categorical in R

I have the following dataset of which a subset is:

structure(list(First.Name = c(5006L, 5006L, 5006L, 5006L, 5006L, 
5006L, 5006L, 5006L, 5006L, 5006L, 5006L, 5006L), TimePoint = c(NA, 
NA, NA, NA, NA, "PRE", NA, NA, NA, NA, NA, NA), Year_Day = c(125, 
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136), Week_Year = c(18, 
18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20), Session = c("Pre", 
"Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Post", 
"Post", "Post")), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

which looks like (with asterisks to denote the error):

# A tibble: 12 x 5
   First.Name TimePoint Year_Day Week_Year Session
        <int> <chr>        <dbl>     <dbl> <chr>  
 1       5006 NA             125        18 Pre    
 2       5006 NA             126        18 Pre    
 3       5006 NA             127        19 Pre    
 4       5006 NA             128        19 Pre    
 5       5006 NA             129        19 Pre    
 6       5006 PRE            130        19 Pre    
 7       5006 NA             131        19 **Pre**    
 8       5006 NA             132        19 **Pre**    
 9       5006 NA             133        19 **Pre**    
10       5006 NA             134        20 Post   
11       5006 NA             135        20 Post   
12       5006 NA             136        20 Post   

I am trying to create a new column per subject called Session that contains the word "Pre" if the Week_Year is the beginning of the subject data up until (and including) TimePoint column contains the word "PRE" and all other rows should be "Post"

My ideal output from the subset above should be:

# A tibble: 12 x 5
   First.Name TimePoint Year_Day Week_Year Session
        <int> <chr>        <dbl>     <dbl> <chr>  
 1       5006 NA             125        18 Pre    
 2       5006 NA             126        18 Pre    
 3       5006 NA             127        19 Pre    
 4       5006 NA             128        19 Pre    
 5       5006 NA             129        19 Pre    
 6       5006 PRE            130        19 Pre    
 7       5006 NA             131        19 **Post**   
 8       5006 NA             132        19 **Post**   
 9       5006 NA             133        19 **Post**   
10       5006 NA             134        20 Post   
11       5006 NA             135        20 Post   
12       5006 NA             136        20 Post   

I am trying variations of

df %>%
  group_by(First.Name) %>%
  mutate(Session = ifelse(TimePoint == "PRE" & Week_Year <= first(Week_Year) + 1, "Pre", "Post")) %>%
  ungroup()

but it is not outputting correctly. Help appreciated.

Aucun commentaire:

Enregistrer un commentaire