lundi 13 août 2018

How to use conditionnal statement with startswith() on Python - dfply?

I'm doing data wrangling on Python, using the package dfply.

I want to create a new variable "a06", from 'FC06' of the dataset data_a, so that :

  • a06 = 1 if FC06[i] starts with the character "1" (ex : FC06[i]=173)
  • a06 = 2 if FC06[i] starts with the character "2"
  • a06 = NaN if FC06[i] = NaN

On R it would be obtained by :

data_a %>% mutate(a06 = ifelse(substr(FC06,1,1)=="1",1,ifelse(substr(FC06,1,1)=="1",2,NaN)))

but I don't find how to do this with Python.

I achieved a first version with just 2 alternatives : NaN or 1, with :

data_a >>        mutate(a06=if_else((X['FC06'].apply(pd.isnull)),float('nan'),1)

but I can't find how to differentiate the result according to the first character of FC06.

(I tried things like :

(data_a >> mutate(a06=if_else(X['FC06'].apply(pd.isnull),float('nan'),if_else(X['FC06'].apply(str)[0]=='1',1,2))))

but without success : - [0] doesn't work there to get the first character - and/or str() can't be used with apply (neither str.startswith('1'))

Does anybody knows how to solve such situations ?

Or another package to do that on Python ?

Thank you !!

Aucun commentaire:

Enregistrer un commentaire