Looking at a pandas dataframe containing information on all olympic athletes for past 150 years (Name, Weight, Country, Sport, etc). Available at https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results#athlete_events.csv.
Attempting to make a for loop that iterates through df rows, checks the value stored in the 'Sport' column against several lists and then adds a column to the df with a parent category within the same row. Code so far:
aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo']
track_sports = ['Athletics','Modern Pentathlon','Triathlon','Biathlon','Cycling']
team_sports = ['Softball','Basketball','Volleyball','Beach Volleyball','Handball','Rugby','Lacrosse']
gymnastic_sports = ['Gymnastics','Rhytmic Gymnastics','Trampolining']
fitness_sports = ['Weightlifting']
combat_sports = ['Boxing','Judo','Wrestling','Taekwondo']
winter_sports = ['Short Track Speed Skating','Ski Jumping','Cross Country Skiing','Luge','Bobsleigh','Alpine Skiing','Curling','Snowboarding','Ice Hocky','Hockey','Speed Skating']
for index, row in df.iterrows():
if df.iloc[0,11] in aquatic_sports:
df['Sport Category'] = 'Aquatic Sport'
elif df.iloc[0,11] in track_sports:
df['Sport Category'] = 'Track Sport'
elif df.iloc[0,11] in gymnastic_sports:
df['Sport Category'] = 'Gymnastic Sport'
elif df.iloc[0,11] in fitness_sports:
df['Sport Category'] = 'Fitness Sport'
elif df.iloc[0,11] in combat_sports:
df['Sport Category'] = 'Combat Sport'
elif df.iloc[0,11] in winter_sports:
df['Sport Category'] = 'Winter Sport'
No errors thrown but unfortunately all values in the new column are the same. Unsure how to pass the current index to ensure each iterations returns a unique, correct value.
Aucun commentaire:
Enregistrer un commentaire