I am dealing with a 2M rows DB and my if loop is taking too long.
The data base has 3 variables : Gift_ID
, ind_id
and gift_date
with data and two empty variables which I want to fill in min
and max
. For every different id (ind_id
) I want to identify the first gift date and the last gift date. I've written these two loops that worked (the loops work if the DB is ordered by ind_ID
and then gift_date
), but when I run them with the hole DB it takes too long. Any ideas of another way of writing this?
for (i in 2:length(NewGifts[,1])){
if(NewGifts$ind_id[i] != NewGifts$ind_id[i-1]){
NewGifts$min[i] = format.Date(NewGifts$gift_date[i], '%Y%m%d')
} else {
NewGifts$min[i] = NewGifts$min[i-1]
}
}
for (i in ((length(NewGifts[,1])-1):1)){
if(NewGifts$ind_id[i] != NewGifts$ind_id[i+1]){
NewGifts$max[i] = format.Date(NewGifts$gift_date[i], '%Y%m%d')
} else {
NewGifts$max[i] = NewGifts$max[i+1]
}
}
Maybe working with data.table is more efficient, but I couldn't find any post helping.
Aucun commentaire:
Enregistrer un commentaire