I want to create in a pandas dataframe and I tried different options. The first two option work, while the third raises an error "The truth value of a Series is ambiguous". I would like to understand what's wrong with the third option and in general which is the best/more efficient implementation (considering a big dataframe in real life)
raw_data1 = {'id': [1,2,3,5],
'age': [0, 5, 10, 2]}
df = pd.DataFrame(raw_data1, columns = ['id','age'])
option 1, works
def set_color(row):
if row["id"] == 1 and row["age"] ==0:
return "red"
elif row["age"] == 10:
return "blue"
else:
return "green"
df = df.assign(color=df.apply(set_color, axis=1))
option 2, works
df["set_color"] = "green"
df.loc[(df.id == 1) & (df.age == 0), 'set_color'] = 'red'
df.loc[(df.age == 10), 'set_color'] = 'blue'
option 3, raising error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
def set_color (db,x,y,z):
if db[x]==1 and db[y]==0:
db[z]="red"
elif db[y] == 10:
db[z]="blue"
else:
db[z]="green"
set_color(df,'id','age','set_color')
Aucun commentaire:
Enregistrer un commentaire