vendredi 20 octobre 2017

Create new column with a for else statement

I'm trying to create a new column called "nintydayinterval" in my data set z. I have another column z$Date that has a numeric date (in excels format). I'm trying to create a variable based on approximately 90 day intervals.

if z$Date < 42460, then z$nintydayinterval == 0, if z$Date < 42550, then z$nintydayinterval == 1, and this keeps going up to 9.

I've tried several methods nothing works, My current form never completes. Any idea how to get this to run smoothly. I was using Lubridate package but it wont work with a data set this large

Note: I have 17Million lines of data I'm running through this, so efficiency is important. I've done similiar if statements and had no problems, but this one I cannot figure out.

z$nintydayinterval <- NA
x<- z$nintydayinterval
y<- z$Date

n<- 17007029
for(i in 1:n)

if (y[i] < 42460) {
  x[y] <- 0
} else if (y[i] < 42550) {
  x[y]<-1
} else if (y[i] < 42640) {
  x[y]<-2
} else if (y[i] < 42730) {
  x[y]<-3
} else if (y[i] < 42820) {
  x[y]<-4
} else if (y[i] < 42910) {
  x[y]<-5
} else if (y[i] < 43000) {
  x[y]<-6
} else if (y[i] < 43090) {
  x[y]<-7
} else if (y[i] < 43180) {
  x[y]<-8
} else {
  x[y]<-9
}

> str(z)
'data.frame':   17007029 obs. of  4 variables:
 $ Search          : Factor w/ 109505 levels "5c4feef",..: 1 1 1 1 1 1 1 1 ...
 $ Event           : Factor w/ 85 levels "Arcnet","Boot",..: 2 22 6 6 6 6 22 22 
 $ Date            : chr  "42961" "42961" "42735" "42735" ...
 $ nintydayinterval: logi  NA NA NA NA NA NA ...
> 

Aucun commentaire:

Enregistrer un commentaire