lundi 19 septembre 2016

R: for loop creating new columns populated by conditional statement based on the previous column

my [simplified] data looks like this:

id = 1:50
first_active = sample(1:20, 50, replace = TRUE)
df = data.frame(cbind(id, first_active))

for(i in 1:35) {
  df[paste0("week", i,sep="")] = sample(0:1, 50, replace = TRUE)
}

I'm writing a for loop that would create a p1, p2,...p35 variables and populate them with the following:

(example for creating a p4 column that would apply to p1-35):

df %>% 
  mutate(
  p4 = ifelse(week4 > 0, "active", 
                 ifelse(first_order<4 & p3 == "lapsed2", "lapsed3",
                        ifelse(first_order<4 & p3 == "lapsed1", "lapsed2",
                               ifelse(first_order<4 & p3 == "active", "lapsed1", "NA")))))

In essence, the outcome should look like this, for columns p1-p35:

reference data

head(markov_df[,1:37])

  id first_active week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
   1           14     1     0     1     1     0     0     0     0     0      1      1      0      0      0      0      0
   2            3     0     1     1     0     1     0     0     1     0      1      0      0      1      0      0      1
   3            1     1     1     1     0     1     0     0     0     1      0      0      0      1      0      1      0
   4            3     0     1     0     1     1     0     1     1     1      0      1      1      1      0      0      0
   5            1     1     0     1     1     0     1     1     0     1      1      1      0      0      0      1      1
   6           14     0     1     1     0     0     1     0     1     0      1      1      1      1      0      0      1
week17 week18 week19 week20 week21 week22 week23 week24 week25 week26 week27 week28 week29 week30 week31 week32 week33
   1      0      0      1      1      1      1      1      1      1      0      1      1      1      0      1      1      0
   2      0      0      1      0      0      1      1      0      1      0      1      1      1      0      0      0      1
   3      1      1      1      0      1      1      1      1      0      0      0      1      1      0      1      1      0
   4      0      1      0      1      1      0      1      0      1      1      1      0      1      0      0      0      0
   5      0      1      1      1      0      1      0      1      0      0      1      1      1      1      1      0      1
   6      0      0      1      1      0      0      0      0      0      1      0      1      0      1      1      1      1
  week34 week35
    0      0
    1      1
    0      1
    0      0
    1      0
    0      0

desired output data (cols 1 - 15)

head(markov_df[,38:52])

    p1      p2     p3      p4      p5      p6      p7      p8      p9     p10     p11     p12     p13     p14     p15
active      NA active  active      NA      NA      NA      NA      NA  active  active      NA      NA      NA      NA
    NA  active active lapsed1  active lapsed1 lapsed2  active lapsed1  active lapsed1 lapsed2  active lapsed1 lapsed2
active  active active lapsed1  active lapsed1 lapsed2 lapsed3  active lapsed1 lapsed2 lapsed3  active lapsed1  active
    NA  active     NA  active  active lapsed1  active  active  active lapsed1  active  active  active lapsed1 lapsed2
active lapsed1 active  active lapsed1  active  active lapsed1  active  active  active lapsed1 lapsed2 lapsed3  active
    NA  active active      NA      NA  active      NA  active      NA  active  active  active  active      NA      NA

What I managed to get so far is:

i = 2:35
j = 1:34
for(ind in seq_along(i)) {
markov_function[paste0("p", i,sep="")] = ifelse(markov_function[paste0("week", i, sep="")]> 0, "active",
                                                  ifelse(markov_function["first_order"] < i & markov_function[paste0("week",   i-1, sep="")] == paste0("lapsed", j, sep=""),
                                                       paste0("lapsed", j+1, sep=""), NA))
}

but I get the error:

Error in ifelse(markov_function["first_order"] < i & markov_function[paste0("week",  : 
  binary operation on non-conformable arrays

I suspect I'm missing something basic, I'll be grateful for some help here, thanks! Kasia

Aucun commentaire:

Enregistrer un commentaire