I want to create a new dataframe grouped by store that creates several new columns of summaries for each store. In the example below, for each store, I want to create the four variables. The variables are similar to sumif and countif in excel but calculated for each group (store).
- A variable that sums all product revenues above say $50.
- A variable that sums all product revenues below $50
- A variable that counts the number of products with revenues above $50
- A variable that counts the number of products with revenues below $50.
I have tried various forms of groupby and agg function. I perceive that I should use lambda but I am unsure how I would strutcure that syntax in order to create multiple variables at one time.
import pandas as pd
import numpy as np
n = 20
df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
'Revenue': (np.random.random(n) * 50 + 10).round(2)
})
df
So I can easily calculate multiple summary measures on Revenue. I don't know how to create custom summary measures like sumif or countif:
df2 = df.groupby('Store')['Revenue'].agg({'Rev_sum': 'sum', 'Rev_max': 'max'})
For each store (i.e. store_1, store_2) I want four variables in the new dataframe as described above (i.e. Rev_sum_great_50 , etc...)
Aucun commentaire:
Enregistrer un commentaire