mardi 12 février 2019

Speed up a double for loop in R

I am having an issue with the length of time it's taking to run a double for loop with an if statement within R. In one data set I have about 3000000 rows (DF1) and in the other I have about 22 (DF2). An example of the two data frames I have are given below.

DF1
DateTime                 REG
2018-07-01 12:00:00      NHDG
2018-07-12 11:55:23      NSKR

DF2
StartDateTime           EndDateTime         Direction
2018-07-01 07:55:11    2018-07-01 12:01:56     W
2018-07-12 11:00:23    2018-07-12 11:45:00     E

I want to flag anything in DF1 when the DateTime is between StartDateTime and EndDateTime. Hence the output will be as follows:

DF1  
DateTime                 REG      Flag
2018-07-01 12:00:00      NHDG      1
2018-07-12 11:55:23      NSKR      0

The code I have used currently is:

#Flag if in delay or not
DF1$Flag<-0

for (i in 1:nrow(DF1)){
  for (j in 1:nrow(DF2)){
    if ((DF1$DateTime[i] >= DF2$StartDateTime[j]) & (DF1$DateTime <= DF2$EndDateTime[j])){
      DF1$Flag[i]<-1
    } else {
      DF1$Flag[i]<-DF1$Flag
    }
  }
}

I am more than happy for this code to be taken out of the for loops if possible.

Aucun commentaire:

Enregistrer un commentaire