jeudi 1 octobre 2020

Group by + New Column + Grab value former row based on conditionals

I have this set

df=pd.DataFrame({'user':[1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4],
                  'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-19','1995-12-20','1995-12-23','1995-12-26','1995-12-27'],
                  'dc':['1995-09-02','1995-09-02','1995-10-02','1995-10-05','1995-10-05','1995-11-05','1995-11-05','1995-11-10','1995-11-10','1995-11-10','1995-12-10','1995-12-23','1995-12-23','1995-12-23','1995-12-23','1995-12-23'],
                  'tp':['s','c','f','s','c','c','f','s','c','s','f','s','s','c','s','f'],
                  'vt':['0','1','0','0','1','0','0','0','1','0','0','0','0','1','0','0'],
                  'c1':['1','5','0','2','3','9','3','2','0','5','5','6','4','0','6','0'],
                  'c2':['3','4','0','2','5','3','8','4','0','6','2','7','0','0','8','0'],
                  'c3':['5','5','2','5','6','4','2','4','4','6','3','4','3','8','2','7']})
df2

which gives:

user    date        dc     tp   vt  c1   c2  c3
 1  1995-09-01  1995-09-02  s   0    1   3   5
 1  1995-09-02  1995-09-02  c   1    5   4   5
 1  1995-10-03  1995-10-02  f   0    0   0   2
 2  1995-10-04  1995-10-05  s   0    2   2   5
 2  1995-10-05  1995-10-05  c   1    3   5   6
 2  1995-11-07  1995-11-05  c   0    9   3   4
 2  1995-11-08  1995-11-05  f   0    3   8   2
 3  1995-11-09  1995-11-10  s   0    2   4   4
 3  1995-11-10  1995-11-10  c   1    0   0   4
 3  1995-11-15  1995-11-10  s   0    5   6   6
 3  1995-12-18  1995-12-10  f   0    5   2   3
 4  1995-12-19  1995-12-23  s   0    6   7   4
 4  1995-12-20  1995-12-23  s   0    4   0   3
 4  1995-12-23  1995-12-23  c   1    0   0   8
 4  1995-12-26  1995-12-23  s   0    6   8   2
 4  1995-12-27  1995-12-23  f   0    0   0   7

I want to create new column create new column df['dc2'], where groupby user, column df['dc2']= df['dc']. However if df['dc'] meets the conditions 'tp'='c' & 'vt'=1 & 'c1'=0 & 'c2'=0, then grab the date of the former entry (raw of the user)

#ie. for user 3, on column df['dc'] if we look at entry 'tp'='c' & 'vtb'=1, we can see it has 'c1'=0 and 'c2'=0, #thus the value of df['dc2'] will be (for user 3) '1995-11-09' instead of '1995-11-10'

#ie. for user 4, on column df['dc'] if we look at entry 'tp'= 'c' & 'vtb'=1, we can see it has 'c1'=0 and 'c2'=0, in this case df['dc2'] should be (for user 4) '1995-12-20' instead of '1995-12-23'

Here is the desired result:

user    date       dc           dc2     tp   vt c1  c2  c3
1   1995-09-01  1995-09-02  1995-09-02   s   0   1   3   5
1   1995-09-02  1995-09-02  1995-09-02   c   1   5   4   5
1   1995-10-03  1995-10-02  1995-10-02   f   0   0   0   2
2   1995-10-04  1995-10-05  1995-10-05   s   0   2   2   5
2   1995-10-05  1995-10-05  1995-10-05   c   1   3   5   6
2   1995-11-07  1995-11-05  1995-11-05   c   0   9   3   4
2   1995-11-08  1995-11-05  1995-11-05   f   0   3   8   2
3   1995-11-09  1995-11-10  1995-11-09   s   0   2   4   4
3   1995-11-10  1995-11-10  1995-11-09   c   1   0   0   4
3   1995-11-15  1995-11-10  1995-11-09   s   0   5   6   6
3   1995-12-18  1995-12-10  1995-12-09   f   0   5   2   3
4   1995-12-19  1995-12-23  1995-12-20   s   0   6   7   4
4   1995-12-20  1995-12-23  1995-12-20   s   0   4   0   3
4   1995-12-23  1995-12-23  1995-12-20   c   1   0   0   8
4   1995-12-26  1995-12-23  1995-12-20   s   0   6   8   2
4   1995-12-27  1995-12-23  1995-12-20   f   0   0   0   7

Aucun commentaire:

Enregistrer un commentaire