vendredi 27 novembre 2020

R to create a tally of previous events within a sliding window time period

Any one able to assist me with a problem using R to create a sum of previous events in a specific time period please? I apologise if I do not follow protocol this is my first question here.

I have a df of IDs and event dates. In the genuine df the events are date times but to keep things simpler I have used dates here.

I am trying to create a variable which is a tally of the number of previous events within the last 2 years for that ID number (or 750 days as Im not too concerned about leap years but that would be nice to factor in).

There is an additional issue in that some IDs will have more than one event on the same date (and time in the genuine df). I do not want to remove these as in the real df there is an additional variable which is not necessarily the same. However, in the sum I want to count events happening on the same date as one event i.e. if an ID had only 2 events but they were on the same day the result would be 0, or there may be 3 rows of dates within the previous 2 years for an ID - but as two are the same date the result is 2. I have created a outcome vector to give an example of what I have after ID 7 has examples of this.

If there were 3 previous events all on the same day the result sum would be 1 and any subsequent events in 2 years

ID <- c(10,1,11,2,2,13,4,5,6,6,13,7,7,7,8,8,9,9,9,10,1,11,2,11,12,9,13,14,7,15,7)
event.date<-c('2018-09-09','2016-06-02','2018-08-20', '2018-11-03', '2018-07-10', '2017-03-08', '2018-06-16', '2017-05-20', '2016-04-02', '2016-07-27', '2018-07-15', '2018-06-15', '2018-06-15', '2018-01-16', '2017-10-07', '2016-08-17','2018-08-01','2017-01-22','2016-08-05', '2018-08-13', '2016-11-28', '2018-11-24','2016-06-01', '2018-03-26', '2017-02-04', '2017-12-01', '2016-05-16', '2017-11-25', '2018-04-01', '2017-09-21', '2018-04-01')
df<-data.frame(ID,event.date)

df<-df%>%
  arrange(ID,event.date)

The resulting column should look something like this.

event.count <- c(0,1,0,0,1,0,0,0,1,0,1,1,2,2,0,1,0,1,2,3,0,1,0,1,2,0,0,1,1,0,0)
df$event.count<-event.count

I have tried various if else and use of lag() but cannot get what I am after

thank you.

Aucun commentaire:

Enregistrer un commentaire