I want to use ifelse and update a column based on two conditions within the row, and update according to the previous row result. Example data.frame:
trID <- c(249,249,249,249,249,249,249,259,259,259)
prevTRID <- c(249,249,249,249,249,249,249,249,259,259)
tDiff <- c(100, 100,10, 950, 200, 820, 250, NA, 200, 970)
trNO <- rep(1,10)
df <- data.frame(cbind(trID, prevTRID, tDiff, trNO))
The desired output is the following: trID prevTRID tDiff trNO 249 249 100 1 249 249 100 1 249 249 10 1 249 249 950 2 249 249 200 2 249 249 820 3 249 249 250 3 259 249 NA 1 259 259 200 1 259 259 970 2
Basically I want to update trNO based on previous row result of two conditions:
- If trID is not equal to prevTRID, trNO is straight 1
- If trID equals to prevTRID check if tDiff is less than 800
- If less, trNO equals to previous trNO
- If not, trNO equals to previous trNO+1
I wrote the following line:
df$trNO <- ifelse(df$trID == df$prevTRID, ifelse(df$tDiff <= 800, df$trNO[-1], df$trNO[-1]+1),1)
However, I got the following result, and I cannot seem to figure out how to fix:
trID prevTRID tDiff trNO
249 249 100 1
249 249 100 1
249 249 10 1
249 249 950 2
249 249 200 1
249 249 820 2
249 249 250 1
259 249 NA 1
259 259 200 1
259 259 970 2
Clearly I cannot increment trNO based on dynamic result of previous row in ifelse. I have a very large dataframe to this kind of calculation, and I cannot find a way to do.
I tried:
- for-loop: takes very very long time for millions of data
- tried parallel, but couldnt figure out the dependence over cores, it calculated the same way ifelse did.
- tried only foreach with %do% to make it faster but still very slow
-
tried tapply function but failed to pass multiple arguments (failed to figure out how to pass tDiff argument of the row as follows
a <- tapply(df$trNO, df$trID, fun(x, df$tDiff){})
Any help is appreciated either with ifelse or apply functions. Thanks.
Aucun commentaire:
Enregistrer un commentaire