lundi 24 février 2020

python 3.6 pandas conditionally filling missing values

If there is a dataframe:

import pandas as pd
import numpy as np

users=pd.DataFrame(
                [
                {'id':1,'date':'01/01/2019', 'transaction_total':-1, 'balance_total':102},
                {'id':1,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
                {'id':1,'date':'01/03/2019', 'transaction_total':np.nan, 'balance_total':np.nan},
                {'id':1,'date':'01/04/2019', 'transaction_total':np.nan, 'balance_total':np.nan},
                {'id':1,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':np.nan},
                {'id':2,'date':'01/01/2019', 'transaction_total':-2, 'balance_total':200},
                {'id':2,'date':'01/02/2019', 'transaction_total':-2, 'balance_total':100},
                {'id':2,'date':'01/04/2019', 'transaction_total':np.nan, 'balance_total':np.nan},
                {'id':2,'date':'01/05/2019', 'transaction_total':-4, 'balance_total':np.nan}  
                ]
                )

print(users[['id','date','balance_total','transaction_total']])

Dataframe:

   id        date  balance_total  transaction_total
0   1  01/01/2019          102.0               -1.0
1   1  01/02/2019          100.0               -2.0
2   1  01/03/2019            NaN                NaN
3   1  01/04/2019            NaN                NaN
4   1  01/05/2019            NaN               -4.0
5   2  01/01/2019          200.0               -2.0
6   2  01/02/2019          100.0               -2.0
7   2  01/04/2019            NaN                NaN
8   2  01/05/2019            NaN               -4.0

How can i do the following?

If both of the transaction_total and balance_total are NaN, just fill in the last date's balance_total (e.g. in row 3 where id=1, since the user1's transaction_total and balance_total are NaN, fill in 100 from 01/02/2019. The same will be row 4, fill in 100 from 01/03/2019.)

If the transaction_total is NOT NaN, but balance_total is NaN, do the math of the previous date's balance_total+ the current row's date's transaction_total.

In user 1, 01/05/2019 as example: the balance total will be=100+(-4), where 100 is 01/04/2019's balance total, and (-4) is 01/05/2019's transaction total.

Desired output:

    id        date  balance_total  transaction_total
0   1  01/01/2019          102.0               -1.0
1   1  01/02/2019          100.0               -2.0
2   1  01/03/2019          100.0                NaN
3   1  01/04/2019          100.0                NaN
4   1  01/05/2019           96.0               -4.0
5   2  01/01/2019          200.0               -2.0
6   2  01/02/2019          100.0               -2.0
7   2  01/04/2019          100.0                NaN
8   2  01/05/2019           96.0               -4.0

here is my code but it doesn't work. I think i couldn't figure out how to do "if logic in pandas when a row is null, do something".

for i, row in df.iterrows():

    if(pd.isnull(row['transaction_total'] is True)): 

        if(pd.isnull(row['balance_total'] is True)): 

            df.loc[i,'transaction_total'] = df.loc[i-1,'transaction_total']

Could someone enlighten?

Aucun commentaire:

Enregistrer un commentaire