lundi 16 avril 2018

python pandas new column categorization based on conditions in other columns

Working with the following python pandas dataframe df:

df = pd.DataFrame({'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'], 
                   'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
                   'product_category': ['X','X','Y','Y','X','Y','Y','X']})

transaction_id | product_id | product_category
A123              255472             X
A123              251235             X
B345              253764             Y
B345              257344             Y
C567              221577             X
C567              209809             Y
D678              223551             Y
D678              290678             X

I need to add another column "transaction_category", which looks at the transaction_id and which product categories are in the transaction_id. This is the output I am looking for:

transaction_id | product_id | product_category | transaction_id
123              255472             X                X only
123              251235             X                X only
345              253764             Y                Y only
345              257344             Y                Y only
567              221577             X                X & Y
567              209809             Y                X & Y
678              223551             Y                X & Y
678              290678             X                X & Y

Please note that I have other columns in my dataframe that I am not using, so I guess I need to start with a grouby?

df2 = df.groupby(['transaction_id','product_category']).reset_index()

Aucun commentaire:

Enregistrer un commentaire