I have an example data.frame (df, below) which contains values (size) of a variable at three time steps sz_t1, sz_t2, sz_t3. The variables t1_t2, t2_t3, t1_t3 are a binary indicator of whether an individual (ID) ‘survived’ from time step 1 – 2, 2 – 3, or 1 – 3.
Instead of using time steps, I would like to use ‘age’ as the unit for time. That is, use the first non-zero value per ID as the starting point. For example, if sz_t1 is zero, t1_t2 would be recorded as NA, but using ‘age’ if sz_t1 is zero, but sz_t2 is greater than zero than age1_2 can be recorded with a zero or one outcome for survival. So if I use this age shifted time unit, I would also like to know the value (size) at that age (sz_age1, etc)
The r code below uses ifelse statements to achieve the desired results for the example data.frame (df). However, as the number of time steps increase I feel that there might be a less verbose, or ‘cleaner’ method to achieve the results I seek. As more time steps get added I am not sure I'll be able to grasp the ifelse chains.
I've attempted to find information creating groups based on first non-zero column index to then lag the those groups by the appropriate value. However, I did not find a way to do this, at least not in wide format, or for different lags per group.
Is there another r package or command that could acheive these results in an effort to reduce the chained ifelse length?
df <- structure(list(ID = 1:5, sz_t1 = c(0.5, 0.25, 0, 0, 0.25), sz_t2 = c(0.6,
0.25, 0.25, 0.55, 0), sz_t3 = c(0, 0.35, 0.35, 0, 0)), .Names = c("ID",
"sz_t1", "sz_t2", "sz_t3"), class = "data.frame", row.names = c(NA,
-5L))
# did the id 'survive' from t1 to t2, etc
df$t1_t2 <- ifelse(df$sz_t1 > 0, ifelse(df$sz_t2 >0,1,0), NA)
df$t2_t3 <- ifelse(df$sz_t2 > 0, ifelse(df$sz_t3 >0,1,0), NA)
df$t1_t3 <- ifelse(df$sz_t1 > 0, ifelse(df$sz_t3 >0,1,0), NA)
# "age"
# did the id 'survive' from age1 to age2, etc
df$age1_2 <- ifelse(df$sz_t1 > 0, ifelse(df$sz_t2 >0,1,0),
ifelse(df$sz_t2 > 0, ifelse(df$sz_t3 >0,1,0), NA))
# if zero in first time step, age 2 to age 3 is NA as this time has yet to elapse
df$age2_3 <-ifelse(df$age1_2 > 0, ifelse(df$sz_t1 > 0,
ifelse(df$sz_t2 > 0, ifelse(df$sz_t3 > 0,1,0),NA),NA),NA)
# for the moment this is the same as df$t1_t3, need a t1 to get a value for age1_3, otherwise NA as the this time period has yet to elapse
df$age1_3 <- ifelse(df$sz_t1 > 0, ifelse(df$sz_t3 >0,1,0), NA)
# what was the size at the ages
df$sz_age1 <- ifelse(df$sz_t1 > 0, df$sz_t1, df$sz_t2)
df$sz_age2 <- ifelse(df$sz_t1 > 0, df$sz_t2, df$sz_t3)
df$sz_age3 <- ifelse(df$sz_t1 > 0, df$sz_t3, NA)
Aucun commentaire:
Enregistrer un commentaire