mardi 6 juin 2017

R: evaluating multiple conditionals multiple times

I have data like this:

df = as.data.frame(cbind(
  event1 = c(88.76,96.04,99.60,88.76,99.60,34.04,96.04,87.03,87.44,87.44),
  time1 = c(0.100,0.033,0.000,0.117,0.000,0.000,0.050,0.500,0.133,0.117),
  event2 = c(NA,99.60,NA,34.04,99.62,88.76,87.44,87.41,88.76,88.76),
  time2 = c(NA,0.050,NA,0.100,0.017,0.083,0.200,0.500,0.133,0.050),
  event100 = c(NA,89.52,NA,34.04,93.93,34.02,88.76,88.01,88.01,87.41),
  time100 = c(NA,0.050,NA,0.100,0.033,0.117,0.300,0.500,0.233,0.300),
  event_88.76_within_0.1 = rep(0,10)
))

where event1 is the code for the first event a subject had and time1 is how long it took before event1 happened, and each subject has up to 100 events and times to events.

I am trying to create a variable (event_88.76_within_0.1) that indicates if event 88.76 happened within 0.1 minutes. So it would equal 1 if any of a subject's events equals 88.76 and the corresponding time to event is less than or equal to 0.1.

Using this nested for loop:

for(r in 1:nrow(df)){ #for each subject
  for(c in 1:6){ #for each event
    if( !is.na(df[r, c]) & df[r, c] == 88.76 & df[r,(c+1)] <= 0.1){
#if the event code is not missing and if it's the needed event code and
#the next column over (the corresponding time to event) is less than 0.1
      df[r,"event_88.76_within_0.1"] = 1   
    } 
    i = i + 2  #skip 2 columns to get to next event code
  }
}

I can get this, which is what I want:

      event1 time1 event2 time2 event100 time100 event_88.76_within_0.1
 [1,]  88.76 0.100     NA    NA       NA      NA                      1
 [2,]  96.04 0.033  99.60 0.050    89.52   0.050                      0
 [3,]  99.60 0.000     NA    NA       NA      NA                      0
 [4,]  88.76 0.117  34.04 0.100    34.04   0.100                      0
 [5,]  99.60 0.000  99.62 0.017    93.93   0.033                      0
 [6,]  34.04 0.000  88.76 0.083    34.02   0.117                      1
 [7,]  96.04 0.050  87.44 0.200    88.76   0.300                      0
 [8,]  87.03 0.500  87.41 0.500    88.01   0.500                      0
 [9,]  87.44 0.133  88.76 0.133    88.01   0.233                      0
[10,]  87.44 0.117  88.76 0.050    87.41   0.300                      1

But the data set has thousands of subjects (each with 100 possible events), so the nested for loops take a while to run.

I would like to vectorize the above loop to something like this:

df$event_88.76_within_0.1 = 0
df$event_88.76_within_0.1[df[ "events that equal 88.76 and occurred within 0.1" ]]=1

but I haven't had any luck.

Any help would be greatly appreciated.

Aucun commentaire:

Enregistrer un commentaire