if-statement: python pandas new column categorization based on conditions in other columns

lundi 16 avril 2018

python pandas new column categorization based on conditions in other columns

Working with the following python pandas dataframe df:

df = pd.DataFrame({'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'], 
                   'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
                   'product_category': ['X','X','Y','Y','X','Y','Y','X']})

transaction_id | product_id | product_category
A123              255472             X
A123              251235             X
B345              253764             Y
B345              257344             Y
C567              221577             X
C567              209809             Y
D678              223551             Y
D678              290678             X

I need to add another column "transaction_category", which looks at the transaction_id and which product categories are in the transaction_id. This is the output I am looking for:

transaction_id | product_id | product_category | transaction_id
123              255472             X                X only
123              251235             X                X only
345              253764             Y                Y only
345              257344             Y                Y only
567              221577             X                X & Y
567              209809             Y                X & Y
678              223551             Y                X & Y
678              290678             X                X & Y

Please note that I have other columns in my dataframe that I am not using, so I guess I need to start with a grouby?

df2 = df.groupby(['transaction_id','product_category']).reset_index()

if-statement

lundi 16 avril 2018

python pandas new column categorization based on conditions in other columns

Aucun commentaire:

Enregistrer un commentaire