mercredi 25 septembre 2019

R loop too long

I am dealing with a 2M rows DB and my if loop is taking too long.

The data base has 3 variables : Gift_ID, ind_id and gift_date with data and two empty variables which I want to fill in min and max. For every different id (ind_id) I want to identify the first gift date and the last gift date. I've written these two loops that worked (the loops work if the DB is ordered by ind_ID and then gift_date), but when I run them with the hole DB it takes too long. Any ideas of another way of writing this?

for (i in 2:length(NewGifts[,1])){
  if(NewGifts$ind_id[i] != NewGifts$ind_id[i-1]){
NewGifts$min[i] = format.Date(NewGifts$gift_date[i], '%Y%m%d')
  } else {
    NewGifts$min[i] = NewGifts$min[i-1]
  }
}


for (i in ((length(NewGifts[,1])-1):1)){
  if(NewGifts$ind_id[i] != NewGifts$ind_id[i+1]){
    NewGifts$max[i] = format.Date(NewGifts$gift_date[i], '%Y%m%d')
  } else {
NewGifts$max[i] = NewGifts$max[i+1]
  }
}

Maybe working with data.table is more efficient, but I couldn't find any post helping.

Aucun commentaire:

Enregistrer un commentaire