dimanche 31 janvier 2021

Optimise processing of for loop?

I have this basic dataframe:

     dur    type    src    dst
0     0     new     543     1
1     0     new     21      1
2     1     old     4828    2
3     0     new     321     1
...
(total 450000 rows)

My aim is to replace the values in src with either 0, 1 or 2 depending on the values. I created a for loop/if else below:

for i in df['src']:
    if i <= 1000:
        df['src'].replace(to_replace = [i], value = [1], inplace = True)
    elif i <= 2500:
        df['src'].replace(to_replace = [i], value = [2], inplace = True)
    elif i <= 5000:
        df['src'].replace(to_replace = [i], value = [3], inplace = True)
    else:
        print('End!')

The above works as intended, but it is awfully slow trying to replace the entire dataframe with 450000 rows (it is taking more than 30 minutes to do this!).

Is there a more Pythonic way to speed up this algorithm?

Aucun commentaire:

Enregistrer un commentaire