I am working with a big dataset (more than 2 million rows x 10 columns) that has a price column. The values are formatted including a thousand-dot-separator (e.g. 1.000) and also uses dots to separate decimals (e.g 3.000.75 instead of 3000,75).
I want to format the column as float but those 2 dots in a value are giving me headaches.
Typically, and assuming that there is no number over 1.000.000 for simplicity, I would do something like this
for i in range (0,len(df)):
cell=str(df.iloc[i]['price'])
if cell.count(".")==2:
cell=cell.split(".")[0] + cell.split(".")[1] + '.' + cell.split(".")[2]
And then yes, format the column as float.
But I know this is far from optimal (the for loop).
How can I use the power of pandas to avoid the for here?
Thanks!
Aucun commentaire:
Enregistrer un commentaire