I need to subset a dataframe (df1) that has measurements (temp) recorded for every 5 minutes, with datetime as the index.
Dataframe df2, contains data on when there has been an event. 0 is the start of the event and 1 is the end of the event. df2 has a column called date, which is the datetime of the start and end of the respective event. The start and end of all events are recorded to the nearest second.
I want to subset df1 based on the times that there has been an event, using the same datetime format as contained in df1 (temp for every 5 minutes).
In the example below, there has been an event between 00:07:00 and 00:14:00, so I would like df3 to contain df1['temp'] 00:05:00 and 00:10:00. There has also been an event between 00:41:00 and 00:44:00, so i would also like df3 to contain 00:40:00.
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'temp' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]},
index=pd.date_range('2019-05-02T00:00:00', '2019-05-02T01:00:00', freq='5T'))
df2 = pd.DataFrame({'event' : [0, 1, 0, 1],
'date' : ['2019-05-02-00:07:00', '2019-05-02-00:14:00', '2019-05-02-00:41:00', '2019-05-02-00:44:00']})
df2['date'] = pd.to_datetime(df2['date'])
df3 = pd.DataFrame({'result' : [2, 3, 9],
'date' :['2019-05-02-00:05:00', '2019-05-02-00:10:00', '2019-05-02-00:40:00']})
In my actual work, I have 7 separate df's that each contain different events, which I want to subset df1 and combine, so I end up with a single df that is a subset of all the data in df1, when there has been an event in any of the other 7 df's. df1, in reality, has 37 columns with data that I want to be transferred to the final df3. Once I have the code for the subsetting as above, I was going to merge all of the subset data and delete any duplicates.
Aucun commentaire:
Enregistrer un commentaire