mercredi 8 avril 2020

Change date format for specific cells in Pandas

I am working with a big dataset (more than 2 million rows x 10 columns) that has a date column. Some of the rows are formatted correctly (e.g. 2020/04/08) but I want to change the format of others that are not (concretely, those are formatted as 20200408).

I want to change the format of those that are wrong but I don't want to iterate through all the rows.

Normally, for a small dataset I would do

for i in range (0,len(df)):
    cell=str(df.iloc[i]['date'])
    if len(cell)==8:
        df.iat[i, df.columns.get_loc('date')] = datetime.strptime(cell, '%Y%m%d').strftime('%Y-%m-%d')

but I know this is far from optimal.

How can I use the power of pandas to avoid the loop here?

Thanks!

Aucun commentaire:

Enregistrer un commentaire