samedi 11 avril 2020

Getting rid of a specific character in string in Pandas' column

I am working with a big dataset (more than 2 million rows x 10 columns) that has a price column. The values are formatted including a thousand-dot-separator (e.g. 1.000) and also uses dots to separate decimals (e.g 3.000.75 instead of 3000,75).

I want to format the column as float but those 2 dots in a value are giving me headaches.

Typically, and assuming that there is no number over 1.000.000 for simplicity, I would do something like this

for i in range (0,len(df)):
    cell=str(df.iloc[i]['price'])
    if cell.count(".")==2:
        cell=cell.split(".")[0] + cell.split(".")[1] + '.' + cell.split(".")[2]

And then yes, format the column as float.

But I know this is far from optimal (the for loop).

How can I use the power of pandas to avoid the for here?

Thanks!

Aucun commentaire:

Enregistrer un commentaire