jeudi 28 septembre 2017

if statement and for loop to append new data to original data frame

I have some model outputs that I want to integrate back into the original data file. I was able to do this using nested ifelse(), however I want a way to generalize the process so that I can run it as a batch process across multiple data sets. This is what I originally tried.

The model outputs correspond to time chunks, while each original data point is associated with a discrete time.

I decided to manually run one day at a time (here is an example of one parameter on one day), and with this very large and ugly ifelse was able to correctly aggregate the data.

track[,"phase"]= ifelse((phaseTable1$start[1]<=track$Time)& (track$Time< phaseTable1$end[1]), phaseTable1$phase[1],
                  ifelse((phaseTable1$start[2]<=track$Time)& (track$Time< phaseTable1$end[2]), phaseTable1$phase[2],
                         ifelse((phaseTable1$start[3]<=track$Time)& (track$Time< phaseTable1$end[3]), phaseTable1$phase[3],
                                ifelse((phaseTable1$start[4]<=track$Time)& (track$Time< phaseTable1$end[4]), phaseTable1$phase[4],
                                       ifelse((phaseTable1$start[5]<=track$Time)& (track$Time< phaseTable1$end[5]), phaseTable1$phase[5],
                                              ifelse((phaseTable1$start[6]<=track$Time)& (track$Time< phaseTable1$end[6]), phaseTable1$phase[6],
                                                     ifelse((phaseTable1$start[7]<=track$Time)& (track$Time< phaseTable1$end[7]), phaseTable1$phase[7],
                                                            ifelse((phaseTable1$start[8]<=track$Time)& (track$Time< phaseTable1$end[8]), phaseTable1$phase[8],
                                                                   ifelse((phaseTable1$start[9]<=track$Time)& (track$Time< phaseTable1$end[9]), phaseTable1$phase[9],
                                                                          ifelse((phaseTable1$start[10]<=track$Time)& (track$Time< phaseTable1$end[10]), phaseTable1$phase[10],
                                                                                 ifelse((phaseTable1$start[11]<=track$Time)& (track$Time< phaseTable1$end[11]), phaseTable1$phase[11],
                                                                                        ifelse((phaseTable1$start[12]<=track$Time)& (track$Time< phaseTable1$end[12]), phaseTable1$phase[12],
                                                                                               ifelse((phaseTable1$start[13]<=track$Time)& (track$Time< phaseTable1$end[13]), phaseTable1$phase[13],
                                                                                                      ifelse((phaseTable1$start[14]<=track$Time)& (track$Time<phaseTable1$end[14]), phaseTable1$phase[14],
                                                                                                             ifelse((phaseTable1$start[15]<=track$Time)& (track$Time< phaseTable1$end[15]), phaseTable1$phase[15],
                                                                                                                    ifelse((phaseTable1$start[16]<=track$Time)& (track$Time< phaseTable1$end[16]), phaseTable1$phase[16],
                                                                                                                           ifelse((phaseTable1$start[17]<=track$Time)& (track$Time< phaseTable1$end[17]), phaseTable1$phase[17],
                                                                                                                                  ifelse((phaseTable1$start[18]<=track$Time)& (track$Time< phaseTable1$end[18]), phaseTable1$phase[18],
                                                                                                                                         ifelse((phaseTable1$start[19]<=track$Time)& (track$Time< phaseTable1$end[19]), phaseTable1$phase[19],
                                                                                                                                                ifelse((phaseTable1$start[20]<=track$Time)& (track$Time< phaseTable1$end[20]), phaseTable1$phase[20],
                                                                                                                                                       ifelse((phaseTable1$start[21]<=track$Time)& (track$Time< phaseTable1$end[21]), phaseTable1$phase[21],
                                                                                                                                                              ifelse((phaseTable1$start[22]<=track$Time)& (track$Time< phaseTable1$end[22]), phaseTable1$phase[22],
                                                                                                                                                                     ifelse((phaseTable1$start[23]<=track$Time)& (track$Time< phaseTable1$end[23]), phaseTable1$phase[23],
                                                                                                                                                                            ifelse((phaseTable1$start[24]<=track$Time)& (track$Time< phaseTable1$end[24]), phaseTable1$phase[24], 
                                                                                                                                                                                   ifelse((phaseTable1$start[25]<=track$Time)& (track$Time< phaseTable1$end[25]), phaseTable1$phase[25],
                                                                                                                                                                                          ifelse((phaseTable1$start[26]<=track$Time)& (track$Time< phaseTable1$end[26]), phaseTable1$phase[26], 
                                                                                                                                                                                                 ifelse((phaseTable1$start[27]<=track$Time)& (track$Time< phaseTable1$end[27]), phaseTable1$phase[27],
                                                                                                                                                                                                        ifelse((phaseTable1$start[28]<=track$Time)& (track$Time< phaseTable1$end[28]), phaseTable1$phase[28],
                                                                                                                                                                                                              ifelse((phaseTable1$start[29]<=track$Time)& (track$Time< phaseTable1$end[29]), phaseTable1$phase[29],
                                                                                                                                                                                                                      ifelse((phaseTable1$start[30]<=track$Time)& (track$Time< phaseTable1$end[30]), phaseTable1$phase[30],
                                                                                                                                                                                                                             ifelse((phaseTable1$start[31]<=track$Time)& (track$Time< phaseTable1$end[31]), phaseTable1$phase[31], 
                                                                                                                                                                                                                                    ifelse((phaseTable1$start[32]<=track$Time)& (track$Time< phaseTable1$end[32]), phaseTable1$phase[32],
                                                                                                                                                                                                                                          ifelse((phaseTable1$start[33]<=track$Time)& (track$Time< phaseTable1$end[33]), phaseTable1$phase[33],
                                                                                                                                                                                                                                                  ifelse((phaseTable1$start[34]<=track$Time)& (track$Time< phaseTable1$end[34]), phaseTable1$phase[34],
                                                                                                                                                                                                                                                         ifelse((phaseTable1$start[35]<=track$Time)& (track$Time< phaseTable1$end[35]), phaseTable1$phase[35],phaseTable1$phase[35]

                                                                                                                                                                                                                      )))))))))))))))))))))))))))))))))))

This worked, however it is quite unwieldy and the number of nested conditions varies from day to day within the data.

I tried to rework this into a more practical loop

for ( j in 1:nrow(phaseTable1)){
if((phaseTable1$start[j]<=track$Time)&(track$Time< phaseTable1$end[j])){track$tau== phaseTable1$tau[j]}

 }

and constantly get this warning which results in no data being aggregated

In if ((phaseTable1$start[j] <= track$Time) & (track$Time <  ... the condition has length > 1 and only the first element will be used

I tried it again like this

    for ( j in 1:nrow(phaseTable1)){
        track$phase<-ifelse(((phaseTable1$star [j]<=track$Time)&(track$Time< phaseTable1$end[j])),  phaseTable1$phase[j],"")))
}

And the new columns appear in the data frame but they are empty.

I tried again using a wrapper from the thatssorandom package recommended in a blog post, which also resulted in an error.

for ( j in 1:nrow(phaseTable1)){
ie(
  i(((phaseTable1$start[j]<=track$Time)&(track$Time< phaseTable1$end[j])),track$phase<- phaseTable1$phase[j]),
e("na"))

  }

Is there an obvious mistake I'm making or is there another solution to achieve what I am trying to do? I admit I'm a relatively amateur r user, and I have explored other ifelse forum questions but haven't been able to figure out what I'm doing wrong. I have a working loop that allows me to run my models day by day within the dataframe. If I am able to get this next loop to run then I will be able to nest it into the first loop, and will be able to aggregate the data in batches. Any insight as to what the solution might be will be much appreciated!

Aucun commentaire:

Enregistrer un commentaire