vendredi 10 août 2018

creating new variable raises error pandas / best option function vs apply vs loc

I want to create in a pandas dataframe and I tried different options. The first two option work, while the third raises an error "The truth value of a Series is ambiguous". I would like to understand what's wrong with the third option and in general which is the best/more efficient implementation (considering a big dataframe in real life)

raw_data1 = {'id': [1,2,3,5],
        'age': [0, 5, 10, 2]}
df = pd.DataFrame(raw_data1, columns = ['id','age'])

option 1, works

def set_color(row):
    if row["id"] == 1 and row["age"] ==0:
        return "red"
    elif row["age"] == 10:
        return "blue"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

option 2, works

df["set_color"] = "green"    
df.loc[(df.id == 1) & (df.age == 0), 'set_color'] = 'red'  
df.loc[(df.age == 10), 'set_color'] = 'blue'  

option 3, raising error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

def set_color (db,x,y,z):

    if db[x]==1 and db[y]==0:
       db[z]="red"
    elif db[y] == 10:
       db[z]="blue"
    else:
       db[z]="green"
set_color(df,'id','age','set_color')

Aucun commentaire:

Enregistrer un commentaire