mercredi 29 janvier 2020

Use if statement to modify row value

I have a dataframe ddd with a field date containing messy date values as text:

ddd= pd.DataFrame([["80's of 1900"], ["80's of the 19th century"], ["90's of the 18th century"], ["1955"], ["1822"]], columns=['date'])


In [2]: ddd
Out[2]: 
index date
0     80's of 1900  
1     80's of the 19th century  
2     90's of the 18th century
3     1955
4     1822

What I'm trying to do is to transform text values to a year like in row 3 and 4 for further analyses. To do so in wrote a for loop with an if statement to differentiate rows like 0 and 1, 2.

So far, I have a code that makes a numpy array of indexes where the field date contains 's of to iterate over those rows:

selected_index = ddd[ddd["date"].str.contains('\'s of')].index.values

And a for loop with some regex to rearrange numbers in the string and to change '80's of 1900' to 1980 and '90's of the 18th century' to 1790:

for index in selected_index:
    if ddd.at[index, 'date'].str.contains('th century')]:
        num = re.findall('[0-9]', ddd.at[index, 'date'])
        num2 = ''.join(num)
        num3 = str(num2)[2:4]
        num4 = int(num3) - 1
        num5 = str(num4)
        num6 = str(num2)[:2]
        ddd.at[index, 'date'] =  num5 + num6
    else:
        num = re.findall('[0-9]', ddd.at[index, 'date'])
        num2 = ''.join(num)
        num3 = str(num2)[:2]
        num4 = str(num2)[2:4]
        ddd.at[index, 'date'] =  num4 + num3

But I get the following error:

AttributeError: 'str' object has no attribute 'str'

Expected output:

index date
0     1980 
1     1880
2     1790
3     1955
4     1822

Thank you in advance for your suggestions!

Aucun commentaire:

Enregistrer un commentaire