lundi 15 février 2021

Delete rows if the one cell value does not match that of another dataset

I just started learning python yesterday for data analysis but have a problem. Hope somebody will help.

I have two datasets, each with two columns (can be made into one column) and 1000s of rows.

In each dataset, one column is a timestamp, the other is a measurement. I need to correlate the two measurements, but one of the datasets has extra measurements.

I want the code to read one row from the first (full) dataset, check if there is a corresponding measurement in the second dataset made at the same date/time, and if not then delete the extra row from the first dataset. I want to repeat this for all rows in the first dataset. Please see the image illustrating the principle.

In the image 1: the first row of dataset 1 is read and a corresponding measurement for the same time (07/05/2013 08:00) is found somewhere in dataset two, so no problem, and the same for row two. Row three in 1 is measured at 07/05/2013 08:30, but there is no measurement at 07/05/2013 08:30 in 2 so this row 3 from 1 should be deleted.

I need an if statement but I am confused by the mix between indexing cells/rows/columns in the same statement --- the condition checks one cell in a row against a column of another dataset, and then carries out an action not on the individual cell that was checked but the whole row from which the original cell originated.

Thank you

Aucun commentaire:

Enregistrer un commentaire