If there is a dataframe:
import pandas as pd
import numpy as np
users=pd.DataFrame(
[
{'id':1,'date':'01/01/2019', 'transaction_total':-1, 'balance_total':102},
{'id':1,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
{'id':1,'date':'01/03/2019', 'transaction_total':np.nan, 'balance_total':np.nan},
{'id':1,'date':'01/04/2019', 'transaction_total':np.nan, 'balance_total':np.nan},
{'id':1,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':np.nan},
{'id':2,'date':'01/01/2019', 'transaction_total':-2, 'balance_total':200},
{'id':2,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
{'id':2,'date':'01/04/2019', 'transaction_total':np.nan, 'balance_total':np.nan},
{'id':2,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':np.nan}
]
)
print(users[['id','date','balance_total','transaction_total']])
Dataframe:
id date balance_total transaction_total
0 1 01/01/2019 102.0 -1.0
1 1 01/02/2019 100.0 -2.0
2 1 01/03/2019 NaN NaN
3 1 01/04/2019 NaN NaN
4 1 01/05/2019 NaN -4.0
5 2 01/01/2019 200.0 -2.0
6 2 01/02/2019 100.0 -2.0
7 2 01/04/2019 NaN NaN
8 2 01/05/2019 NaN -4.0
How can i do the following?
If both of the transaction_total and balance_total are NaN, just fill in the last date's balance_total (e.g. in row 3 where id=1, since the user1's transaction_total and balance_total are NaN, fill in 100 from 01/02/2019. The same will be row 4, fill in 100 from 01/03/2019.)
If the transaction_total is NOT NaN, but balance_total is NaN, do the math of the previous date's balance_total+ the current row's date's transaction_total.
In user 1, 01/05/2019 as example: the balance total will be=100+(-4), where 100 is 01/04/2019's balance total, and (-4) is 01/05/2019's transaction total.
Desired output:
id date balance_total transaction_total
0 1 01/01/2019 102.0 -1.0
1 1 01/02/2019 100.0 -2.0
2 1 01/03/2019 100.0 NaN
3 1 01/04/2019 100.0 NaN
4 1 01/05/2019 96.0 -4.0
5 2 01/01/2019 200.0 -2.0
6 2 01/02/2019 100.0 -2.0
7 2 01/04/2019 100.0 NaN
8 2 01/05/2019 96.0 -4.0
here is my code but it doesn't work. I think i couldn't figure out how to do "if logic in pandas when a row is null, do something".
for i, row in df.iterrows():
if(pd.isnull(row['transaction_total'] is True)):
if(pd.isnull(row['balance_total'] is True)):
df.loc[i,'transaction_total'] = df.loc[i-1,'transaction_total']
Could someone enlighten?
Aucun commentaire:
Enregistrer un commentaire