mercredi 25 décembre 2019

how to improve my code iteration through two different data frame with 100K rows to decrease processing speed in python?

Could you please take a look at my code and give me some advice how I might improve my code so it takes less time to process? Main purpose is to look at each row (ID) at test table and find same ID at list table, if it's match then look at time difference between the two identical ID and label them as if it takes less than 1 hours (3600s) or not. Thanks in advance

test.csv has two col (ID, time) and 100K rows list.csv has tow col (ID, time) and 40k rows

sample data: ID Time 83d-36615fa05fb0 2019-12-11 10:41:48

a = -1
for row_index,row in test.iterrows():
   a = a + 1

   for row_index2,row2 in list.iterrows():

       if row['ID'] == row2['ID']:
           time_difference = row['Time'] - row2['Time']
           time_difference = time_difference.total_seconds() 

           if time_difference < 3601 and time_difference > 0:
               test.loc[a, 'result'] = "short waiting time"

Aucun commentaire:

Enregistrer un commentaire