I am having an issue with the length of time it's taking to run a double for loop with an if statement within R. In one data set I have about 3000000 rows (DF1) and in the other I have about 22 (DF2). An example of the two data frames I have are given below.
DF1
DateTime REG
2018-07-01 12:00:00 NHDG
2018-07-12 11:55:23 NSKR
DF2
StartDateTime EndDateTime Direction
2018-07-01 07:55:11 2018-07-01 12:01:56 W
2018-07-12 11:00:23 2018-07-12 11:45:00 E
I want to flag anything in DF1 when the DateTime is between StartDateTime and EndDateTime. Hence the output will be as follows:
DF1
DateTime REG Flag
2018-07-01 12:00:00 NHDG 1
2018-07-12 11:55:23 NSKR 0
The code I have used currently is:
#Flag if in delay or not
DF1$Flag<-0
for (i in 1:nrow(DF1)){
for (j in 1:nrow(DF2)){
if ((DF1$DateTime[i] >= DF2$StartDateTime[j]) & (DF1$DateTime <= DF2$EndDateTime[j])){
DF1$Flag[i]<-1
} else {
DF1$Flag[i]<-DF1$Flag
}
}
}
I am more than happy for this code to be taken out of the for loops if possible.
Aucun commentaire:
Enregistrer un commentaire