if-statement: Python pandas if statement based on two conditions

lundi 16 avril 2018

Python pandas if statement based on two conditions

Piggy backing off this question python pandas new column categorization based on conditions in other columns

#code to re-create my example date
df = pd.DataFrame({'customer_id': ['abc','abc','xyz','xyz','xyz','xyz','thr','thr','abc','abc'],
                   'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678','E789','E789'], 
                   'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678,908354,909238],
                   'product_category': ['X','X','Y','Y','X','Y','Y','X','Y','Z']})

#example data
customer_id|    transaction_id | product_id | product_category
abc             A123              255472             X
abc             A123              251235             X
xyz             B345              253764             Y
xyz             B345              257344             Y
xyz             C567              221577             X
xyz             C567              209809             Y
thr             D678              223551             Y
thr             D678              290678             X
abc             E789              908354             Y
abc             E789              909238             Z

I want to flag all customer_ids that have X and Y in different transactions (not in the same transaction).

#expected output
customer_id|    transaction_id | product_id | product_category | flag 
abc             A123              255472             X            1
abc             A123              251235             X            1
xyz             B345              253764             Y            0
xyz             B345              257344             Y            0
xyz             C567              221577             X            0
xyz             C567              209809             Y            0
thr             D678              223551             Y            0
thr             D678              290678             X            0
abc             E789              908354             Y            1
abc             E789              909238             Z            1

I couldn't come up with a clean solution. In the example above, we have customer abc who has a transaction with only product category X and then a transaction with product category Y and Z. This is the customer I want to flag, they had X and Y, but in different transaction_ids.

One way of doing it I thought of would be to use the code from my previous answer:

df['pre_flag']=df.groupby('transaction_id')['product_category'].transform(lambda x: x + ' only' if len(set(x)) < 2 else ' & '.join(set(x)))

And then split the dataset into two:

df_1 = df.loc[df['pre_flag'] == 'X&Y'].copy()
df_2 = df.loc[df['pre_flag'] != 'X&Y'].copy()

... and use an isin statement; but this is very messy; there must be a better approach. Thank You!

if-statement

lundi 16 avril 2018

Python pandas if statement based on two conditions

Aucun commentaire:

Enregistrer un commentaire