if-statement: Subset with condition in data table

lundi 1 février 2021

Subset with condition in data table

Assume we have data like this:

tmp <- data.table(id1 = c(1,1,1,1,2,2,2,3,3), time=c(1,2,3,4,1,2,3,1,2), user_id=c(1,1,1,1,2,2,2,1,1) )

for each user_id, I want the all samples except the ones with time > 2 when id1 == max(id1).

I use following code now and it give me the warning message like this:

tmp1 <- tmp[, if (id1 == max(id1)) .SD[time <= 2,] else .SD  , by="user_id"] 

Warning messages:
1: In if (id1 == max(id1)) .SD[time <= 2, ] else .SD :
  the condition has length > 1 and only the first element will be used
2: In if (id1 == max(id1)) .SD[time <= 2, ] else .SD :
  the condition has length > 1 and only the first element will be used

I guess it is due to the vectorize problem of if else statement. So I change my code to following:

tmp2 <- tmp[, ifelse(id1 == max(id1), .SD[time <= 2,] , .SD)  , by="user_id"]

Error in `[.data.table`(tmp, , ifelse(id1 == max(id1), .SD[time <= 2,  : 
  Supplied 4 items for column 5 of group 1 which has 6 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

How can I correct my code?

Thanks!

if-statement

lundi 1 février 2021

Subset with condition in data table

Aucun commentaire:

Enregistrer un commentaire