mardi 27 novembre 2018

Conditional Cumulative Sum over the same row in R

I have a dataset like this

dat <- data.frame(Col0 =rep(c("grp1","grp2","grp3", "grp4"), each = 4),
              Col1 = rep(c("B","S","S","B"), 4),
              Col2 = rep(c(1,2,3,4), 4),
              Col3 = rep(c(0.1,0.2,0.3,0.4), 4))

I'm trying to create a fourth column as shown below

dat1 <- data.frame(Col0 =rep(c("grp1","grp2","grp3", "grp4"), each = 4),
               Col1 = rep(c("B","S","S","B"), 4),
               Col2 = rep(c(1,2,3,4), 4),
               Col3 = rep(c(0.1,0.2,0.3,0.4), 4),
               Col4 = c(1, 0.8, 1.26, 4, 1, 0.8, 1.26, 4, 1, 0.8, 1.26, 4))

What I have tried till now,

d1 <- dat %>% 
  group_by(Col0) %>% 
  mutate(Col4 = if_else(Col1 == 'B', Col2,
                        if_else(Col1 == 'S' & lag(Col1 == "B"), lag(Col2)- Col3*lag(Col2), 0)))
d1

The Answer I'm getting is not what is in Col4, which is desired. The condition for getting Col4 is :

 if Col1 is B then get the value of Col2 as it is,

 if Col1 is S & Previous Value of Col1 is B then 1-(0.2*1) which is equal to 0.8
 if Col1 is S & Previous Value of Col1 is S as well then (1+0.8) -((1+0.8)*0.3) which is 1.26

Basically, it's like first performing difference and then performing cumulative sum including the difference and so on.

For now, I have taken a simple example to understand what I'm trying to achieve, the actual data-set has more than 1 million Obs. and Several Thousand Groups and what's worse is that the Combination of 'B' & 'S' alter. Like in some groups it's B,B,S,S and So on...

Any Help on this will be appreciated as I have tried several things other than if_else() and seen many conditional cumulative sum Ques as well but to no avail.

I think the same could be done easily in Excel with SUMIF() Function, but i need to do this with R

Aucun commentaire:

Enregistrer un commentaire