mardi 15 août 2017

Pandas/Python equivalent of complex ifelse match in R

My goal is to get the pandas equivalent of the below R code:

df1$String_1_check = ifelse(df1$String_1 == df2[match(df1$String_2, df2$String_2), 1], TRUE, FALSE)

If the value in the nth row of column String_1 of df1 equals the first column of df2 where the nth row of column String_2 of df1 matches String_2 of df2, then True in a new column String_1_check, else False in String_1_check.

df1 has many instances of the same values in String_1 and String_2, and df2 only has one instance of each possible value in String_1. With these sample dataframe:

df1 = pd.DataFrame({'String_1': ['string 1', 'string 1', 'string 2', 'string 3', 'string 1'], 'String_2': ['string a', 'string a', 'string b', 'string a', 'string c']})
df2 = pd.DataFrame({'String_3': ['string 1', 'string 2', 'string 3'], 'String_2': ['string a', 'string b', 'string c']})

   String_1  String_2
0  string 1  string a
1  string 1  string a
2  string 2  string b
3  string 3  string a
4  string 1  string c

   String_3  String_2
0  string 1  string a
1  string 2  string b
2  string 3  string c

The desired output would be:

   String_1  String_2  String_1_check
0  string 1  string a  True
1  string 1  string a  True
2  string 2  string b  True
3  string 3  string a  False
4  string 1  string c  False

I have tried np.where, isin, pd.match (deprecated now), but haven't found a solution.

Aucun commentaire:

Enregistrer un commentaire