mercredi 17 novembre 2021

condition on multiple rows and multiple columns by ID in r long format table

I have a table in long format like follows:

dt <- data.table(ID=rep(1,3), time=1:3, x=c("N","N","Y"), z=c(NA,NA,"Y"))
dt
   ID time x    z
1:  1    1 N <NA>
2:  1    2 N <NA>
3:  1    3 Y   YY
4:  2    1 Y <NA>
5:  2    2 N <NA>

Now, I want to create a variable based on two possible combinations:

setDT(dt)[ID==1, group := ifelse((time==1 & x=="Y") | (time==3 & z=="YY"), "Y", "N"), by=ID][
          time==2, group := NA]
dt
   ID time x    z group
1:  1    1 N <NA>     N
2:  1    2 N <NA>  <NA>
3:  1    3 Y   YY     Y
4:  2    1 Y <NA>  <NA>
5:  2    2 N <NA>  <NA>

The problem here is that group values change over time, while I want them to be constant by ID. The desired output is:

   ID time x    z group
1:  1    1 N <NA>     Y
2:  1    2 N <NA>  <NA>
3:  1    3 Y   YY     Y
4:  2    1 Y <NA>  <NA>
5:  2    2 N <NA>  <NA>

I've tried %in% instead of == but it didn't solve the problem.

Thank you!

Aucun commentaire:

Enregistrer un commentaire