I have a dataframe ddd
with a field date
containing messy date values as text:
ddd= pd.DataFrame([["80's of 1900"], ["80's of the 19th century"], ["90's of the 18th century"], ["1955"], ["1822"]], columns=['date'])
In [2]: ddd
Out[2]:
index date
0 80's of 1900
1 80's of the 19th century
2 90's of the 18th century
3 1955
4 1822
What I'm trying to do is to transform text values to a year like in row 3 and 4 for further analyses. To do so in wrote a for loop with an if statement to differentiate rows like 0 and 1, 2.
So far, I have a code that makes a numpy array of indexes where the field date
contains 's of
to iterate over those rows:
selected_index = ddd[ddd["date"].str.contains('\'s of')].index.values
And a for loop with some regex to rearrange numbers in the string and to change '80's of 1900' to 1980 and '90's of the 18th century' to 1790:
for index in selected_index:
if ddd.at[index, 'date'].str.contains('th century')]:
num = re.findall('[0-9]', ddd.at[index, 'date'])
num2 = ''.join(num)
num3 = str(num2)[2:4]
num4 = int(num3) - 1
num5 = str(num4)
num6 = str(num2)[:2]
ddd.at[index, 'date'] = num5 + num6
else:
num = re.findall('[0-9]', ddd.at[index, 'date'])
num2 = ''.join(num)
num3 = str(num2)[:2]
num4 = str(num2)[2:4]
ddd.at[index, 'date'] = num4 + num3
But I get the following error:
AttributeError: 'str' object has no attribute 'str'
Expected output:
index date
0 1980
1 1880
2 1790
3 1955
4 1822
Thank you in advance for your suggestions!
Aucun commentaire:
Enregistrer un commentaire