I have an Excel file in a dataframe old_df that I keep data up-to-date with by adding new additions from another Excel file dataframe new_df. I simply pd.concat the new and the old frames together if one of the dates in the new dataframe doesn't exist in the old.
Currently some of the important columns in this file are:
Pub Date Forecast Time Forecast Date State Temp
2018-12-12 23:00:00 2018-12-20 AK 3
2018-12-12 02:00:00 2018-12-20 AK 3.2
2018-12-12 05:00:00 2018-12-20 AK 2.9
.
.
I want to make sure I pass off duplicate rows when I update this old file with new data - skipping non-unique instances of Pub Date with Forecast Time, Forecast Date and State.
Right now I'm using a pretty poor method for this by taking in a list of Pub Dates for the new and the old:
dateList_old = date_old.tolist()
dateList_new = date_new.tolist()
result = any(elm in dateList_new for elm in dateList_old)
if result == True:
print('One or more of the dates already exists in the database')
sys.exit()
else:
frames = [old_df,new_df]
result = pd.concat(frames)
result.to_excel("file", encoding="utf-8", index=False)
But this will run into issues because say if I were to add the same Pub Date of any kind - it would exit the entire write.
I'd like to make it so that if Pub Date + Forecast Time + Forecast Date + State is in old_df then skip and continue writing all other rows that don't exist and exit only if all of these combinations already exist.
Is there an easy way to do this?
Aucun commentaire:
Enregistrer un commentaire