vendredi 4 novembre 2016

Conditional statement and split in a Dataframe

I am looking for a conditional statement in python to look for a certain information in a specified column and put the results in a new column

Here is an example of my dataset:

OBJECTID    CODE_LITH
1              M4,BO
2              M4,BO
3              M4,BO
4              M1,HP-M7,HP-M1

and what I want is: as an example

OBJECTID    CODE_LITH           M4   M1
1              M4,BO            1    0
2              M4,BO            1    0
3              M4,BO            1    0
4              M1,HP-M7,HP-M1   0    1

What I have done so far:

import pandas as pd
import numpy as np
lookup = ['M4']
df.loc[df['CODE_LITH'].str.isin(lookup),'M4'] = 1
df.loc[~df['CODE_LITH'].str.isin(lookup),'M4'] = 0

Since there is multiple variables per rows in "CODE_LITH" it seems like the script in not able to find only "M4" it can find "M4,BO" and put 1 or 0 in the new column

I have also tried:

if ('M4') in df['CODE_LITH']: 
    df['M4'] = 0
else:
    df['M4'] = 1

With the same results.

Thanks for your help.

PS. The dataframe contains about 2.6 millions rows and I need to do this operation for 30-50 variables.

Aucun commentaire:

Enregistrer un commentaire