mercredi 6 septembre 2017

How to make dynamic ifelse dependent on changing previous rows in R

I want to use ifelse and update a column based on two conditions within the row, and update according to the previous row result. Example data.frame:

trID <- c(249,249,249,249,249,249,249,259,259,259)
prevTRID <- c(249,249,249,249,249,249,249,249,259,259)
tDiff <- c(100, 100,10, 950, 200, 820, 250, NA, 200, 970)
trNO <- rep(1,10)
df <- data.frame(cbind(trID, prevTRID, tDiff, trNO))

The desired output is the following: trID prevTRID tDiff trNO 249 249 100 1 249 249 100 1 249 249 10 1 249 249 950 2 249 249 200 2 249 249 820 3 249 249 250 3 259 249 NA 1 259 259 200 1 259 259 970 2

Basically I want to update trNO based on previous row result of two conditions:

  1. If trID is not equal to prevTRID, trNO is straight 1
  2. If trID equals to prevTRID check if tDiff is less than 800
  3. If less, trNO equals to previous trNO
  4. If not, trNO equals to previous trNO+1

I wrote the following line:

df$trNO <- ifelse(df$trID == df$prevTRID, ifelse(df$tDiff <= 800, df$trNO[-1], df$trNO[-1]+1),1)

However, I got the following result, and I cannot seem to figure out how to fix:

trID prevTRID tDiff trNO
249    249      100    1
249    249      100    1
249    249      10     1
249    249      950    2
249    249      200    1
249    249      820    2
249    249      250    1
259    249      NA     1
259    259      200    1
259    259      970    2

Clearly I cannot increment trNO based on dynamic result of previous row in ifelse. I have a very large dataframe to this kind of calculation, and I cannot find a way to do.

I tried:

  1. for-loop: takes very very long time for millions of data
  2. tried parallel, but couldnt figure out the dependence over cores, it calculated the same way ifelse did.
  3. tried only foreach with %do% to make it faster but still very slow
  4. tried tapply function but failed to pass multiple arguments (failed to figure out how to pass tDiff argument of the row as follows

    a <- tapply(df$trNO, df$trID, fun(x, df$tDiff){})

Any help is appreciated either with ifelse or apply functions. Thanks.

Aucun commentaire:

Enregistrer un commentaire