dimanche 25 septembre 2016

R: passing a value over multiple rows, until a value in a different column changes

I've got the customer activity data in the following format:

id = rep(1:2, each = 7)
week = rep(22:28, 2)
orders_num = c(NA, 1,0,0,0,3,0,1,4,0,0,1,0,0)
orders_char = c("none", "1", "none", "none", "none", "2+", "none", "1", "2+", "none", "none","1", "none", "none")
activity =c(NA, "active", "lapsed1", "lapsed2", "lapsed3", "active", "lapsed1", "active", "active", "lapsed1", "lapsed2" , "active", "lapsed1", "lapsed2")

df = data.frame(cbind(id, week, orders_num, orders_char, activity))
df
id week orders_num orders_char activity
 1   22       <NA>        none     <NA>
 1   23          1           1   active
 1   24          0        none  lapsed1
 1   25          0        none  lapsed2
 1   26          0        none  lapsed3
 1   27          3          2+   active
 1   28          0        none  lapsed1
 2   22          1           1   active
 2   23          4          2+   active
 2   24          0        none  lapsed1
 2   25          0        none  lapsed2
 2   26          1           1   active
 2   27          0        none  lapsed1
 2   28          0        none  lapsed2

I'm trying to create a column that would return a combination of activity and orders_char if activity == "active" and a respective 'lapsed' category with orders_char corresponding to the most recent 'active' status:

df_solution
id week orders_num orders_char activity      final
 1   22       <NA>        none     <NA>       <NA>
 1   23          1           1   active   active_1
 1   24          0        none  lapsed1  lapsed1_1
 1   25          0        none  lapsed2  lapsed2_1
 1   26          0        none  lapsed3  lapsed3_1
 1   27          3          2+   active  active_2+
 1   28          0        none  lapsed1 lapsed1_2+
 2   22          1           1   active   active_1
 2   23          4          2+   active  active_2+
 2   24          0        none  lapsed1 lapsed1_2+
 2   25          0        none  lapsed2 lapsed2_2+
 2   26          1           1   active   active_1
 2   27          0        none  lapsed1  lapsed1_1
 2   28          0        none  lapsed2  lapsed2_1

Obviously, simply referring to the previous row is not going to work in cases where lapsed periods last longer:

df %>% group_by(id) %>% arrange(id, week) %>%
mutate(final = ifelse(activity == "active", paste(activity, orders_char, sep="_"),
                    ifelse(grepl("lapsed", activity), paste(activity, lag(orders_char), sep="_"), NA))) 

 id   week orders_num orders_char activity        final
  1     22         NA        none       NA         <NA>
  1     23          1           1   active     active_1
  1     24          0        none  lapsed1    lapsed1_1
  1     25          0        none  lapsed2 lapsed2_none
  1     26          0        none  lapsed3 lapsed3_none
  1     27          3          2+   active    active_2+
  1     28          0        none  lapsed1   lapsed1_2+
  2     22          1           1   active     active_1
  2     23          4          2+   active    active_2+
  2     24          0        none  lapsed1   lapsed1_2+
  2     25          0        none  lapsed2 lapsed2_none
  2     26          1           1   active     active_1
  2     27          0        none  lapsed1    lapsed1_1
  2     28          0        none  lapsed2 lapsed2_none

Is there any way I could "fix" orders_char value from one active period to the other?

Thanks for your suggestions!

Aucun commentaire:

Enregistrer un commentaire