I have a dataframe called dataSessions, where I have 3 columns "Timestamp","CookieID","Name", with over 1,3 million rows. It has been ordered according to CookieID and Timestamp.
I want to create a new column called "Sessions", which displays 1 or 0 according to some criteria.
The criteria for 1 is:
1) The previous cookie is not the same as the current
2) The time between the same cookieID is over 30 minutes
I have tried to do a code where a for if loop runs each row and checks if the CookieID has been there before. But this procedure takes a loooong time. Is there a quicker and more efficient way to do this?
dataSessions$Test<-lag(dataSessions$CookieID, n = 1)
for (i in 1:length(dataSessions$CookieID)) {
if(dataSessions$CookieID[i] %in% dataSessions$Test[i]) {
dataSessions$New[i] <- 0
} else {
dataSessions$New[i] <- 1
}
}
Here is an example of the data, and the SESSIONS column I want generated:
Timestamp CookieID Name SESSIONS
2015-08-28 15:46:03 223284 A 1
2015-09-19 22:26:50 223223 A 1
2015-09-19 22:27:09 223223 A 0
2015-09-19 22:28:11 223223 A 0
2015-09-20 22:29:14 245458 B 1
2015-09-20 22:30:17 245458 B 0
2015-09-20 23:05:01 245458 B 1
2015-09-20 23:06:15 245458 B 0
As is shown, Sessions are only 1 when beginning a new CookieID, or when a CookieIDs last entry is more than 30 minutes old.
Aucun commentaire:
Enregistrer un commentaire