jeudi 1 octobre 2015

Speed up for loop with if in r

I have a dataframe called dataSessions, where I have 3 columns "Timestamp","CookieID","Name", with over 1,3 million rows. It has been ordered according to CookieID and Timestamp.

I want to create a new column called "Sessions", which displays 1 or 0 according to some criteria.

The criteria for 1 is:

1) The previous cookie is not the same as the current
2) The time between the same cookieID is over 30 minutes

I have tried to do a code where a for if loop runs each row and checks if the CookieID has been there before. But this procedure takes a loooong time. Is there a quicker and more efficient way to do this?

dataSessions$Test<-lag(dataSessions$CookieID, n = 1)

for (i in 1:length(dataSessions$CookieID)) {
  if(dataSessions$CookieID[i] %in% dataSessions$Test[i]) {
    dataSessions$New[i] <- 0
  } else {
    dataSessions$New[i] <- 1
  }
}

Here is an example of the data, and the SESSIONS column I want generated:

Timestamp              CookieID     Name     SESSIONS
2015-08-28 15:46:03    223284       A        1
2015-09-19 22:26:50    223223       A        1
2015-09-19 22:27:09    223223       A        0
2015-09-19 22:28:11    223223       A        0
2015-09-20 22:29:14    245458       B        1
2015-09-20 22:30:17    245458       B        0
2015-09-20 23:05:01    245458       B        1
2015-09-20 23:06:15    245458       B        0

As is shown, Sessions are only 1 when beginning a new CookieID, or when a CookieIDs last entry is more than 30 minutes old.

Aucun commentaire:

Enregistrer un commentaire