mercredi 2 mai 2018

If statement between two or more columns in a dataframe

What I am trying to do is make a simple statement that says if a column is not = 'nan', then create a new column in the dataframe and make that the value for each row.

ID1    ID2
Apple  nan
Orange nan
nan    Pear
nan    Grape

Ideally it would then look like so:

ID1    ID2    MasterID
Apple  nan    Apple
Orange nan    Orange
nan    Pear   Pear
nan    Grape  Grape

I've tried using the following:

df['MasterID'] = ''
df.loc[df['ID1'] != 'nan','MasterID'] = df['ID1']
df.loc[df['ID2'] != 'nan','MasterID'] = df['ID2']

But what it's doing is just prioritizing the last statement to undo what the second line creates. Same thing when I use numpy where statement like this:

df['MasterID'] = np.where(df['ID1'] != 'nan',
                          df['ID1'],
                          df['ID2'])

Would like to also use something where I could possibly accomplish this in the future with 3+ columns. Appreciate any guidance.

Aucun commentaire:

Enregistrer un commentaire