I am working with a big dataset (more than 2 million rows x 10 columns) that has a date column. Some of the rows are formatted correctly (e.g. 2020/04/08) but I want to change the format of others that are not (concretely, those are formatted as 20200408).
I want to change the format of those that are wrong but I don't want to iterate through all the rows.
Normally, for a small dataset I would do
for i in range (0,len(df)):
cell=str(df.iloc[i]['date'])
if len(cell)==8:
df.iat[i, df.columns.get_loc('date')] = datetime.strptime(cell, '%Y%m%d').strftime('%Y-%m-%d')
but I know this is far from optimal.
How can I use the power of pandas to avoid the loop here?
Thanks!
Aucun commentaire:
Enregistrer un commentaire