I have a dataset on gym activity. I have data that looks like the following:
VisitNum Date Trainer CaloriesBurned
1 01/04/20 Mike 500
2 01/06/20 Cindy 600
3 01/07/20 Lucy 550
4 01/10/20 Mike 650
5 01/15/20 Lucy 625
6 01/16/20 Lucy 575
7 01/19/20 Mike 525
8 01/21/20 Rebecca 592
9 01/26/20 Lucy 603
10 01/29/20 Mike 559
My goal is to have boxplots comparing the calories burned by trainer. This is just a snapshot of the data and there are >30 different trainers. I don't want to include all trainers in the plot, so I want to create a new variable, "Trainer2" that looks at the number of visits per trainer and if that number is less than 3, then the new value of Trainer would be "Other".
This is my attempt so far:
if data["Trainer"].value_counts() >= 3:
data["Trainer2"]==data.Trainer
else:
data["Trainer2"]=="Other"
I'm getting an error when I run this code and I'm not sure what I'm doing wrong. Can someone help point me in the right direction?
Thank you!
Aucun commentaire:
Enregistrer un commentaire