I have a dataset with telemetry locations, and I want to select continuous periods of monitoring, that is, with daily data. I want to select periods with two locations per day, for at least 10 days, but I want to allow a one-day break. That is, the 10 days, or more, can include one day-gap, but not more.
First, I extracted from the dataset the days that meet the criteria of the number of locations, and I calculated the difference between days (diff). But now I am struggling to select the days of continuous periods that include the exception of a 1-day gap (which, in the example below, is on day "06/03/2018", which did not have any locations), but not more than that.
An example of the data that should be selected:
date diff
23/02/2018 1 # select
24/02/2018 1 # select
25/02/2018 1 # select
26/02/2018 1 # select
27/02/2018 1 # select
28/02/2018 1 # select
01/03/2018 1 # select
02/03/2018 1 # select
03/03/2018 1 # select
04/03/2018 1 # select
05/03/2018 1 # select
07/03/2018 2 # select
08/03/2018 1 # select
09/03/2018 1 # select
10/03/2018 1 # select
11/03/2018 1 # select
13/03/2018 2 # do not select
14/03/2018 1 # do not select
15/03/2018 1 # do not select
16/03/2018 1 # do not select
18/03/2018 2 # do not select
19/03/2018 1 # do not select
05/06/2018 78 # do not select
06/06/2018 1 # select
07/06/2018 1 # select
08/06/2018 1 # select
09/06/2018 1 # select
10/06/2018 1 # select
11/06/2018 1 # select
12/06/2018 1 # select
13/06/2018 1 # select
14/06/2018 1 # select
15/06/2018 1 # select
16/06/2018 1 # select
17/06/2018 1 # select
19/06/2018 2 # select
20/06/2018 1 # select
21/06/2018 2 # do not select
I thought of creating a column with cumsum() and an if statement, where the cumulative sum would re-start each time diff >=3. And then, check if the periods of data are continuous (true/false).
An example:
date diff cumsum is_continuos
23/02/2018 1 1 NA # select
24/02/2018 1 2 true # select
25/02/2018 1 3 true # select
26/02/2018 1 4 true # select
27/02/2018 1 5 true # select
28/02/2018 1 6 true # select
01/03/2018 1 7 true # select
02/03/2018 1 8 true # select
03/03/2018 1 9 true # select
04/03/2018 1 10 true # select
05/03/2018 1 11 true # select
07/03/2018 2 13 false # select
08/03/2018 1 14 true # select
09/03/2018 1 15 true # select
10/03/2018 1 16 true # select
11/03/2018 1 17 true # select
13/03/2018 2 19 false # do not select
14/03/2018 1 20 true # do not select
15/03/2018 1 21 true # do not select
16/03/2018 1 22 true # do not select
18/03/2018 2 24 false # do not select
19/03/2018 1 25 true # do not select
05/06/2018 78 1 NA # select
06/06/2018 1 2 true # select
07/06/2018 1 3 true # select
08/06/2018 1 4 true # select
09/06/2018 1 5 true # select
10/06/2018 1 6 true # select
11/06/2018 1 7 true # select
12/06/2018 1 8 true # select
13/06/2018 1 9 true # select
14/06/2018 1 10 true # select
15/06/2018 1 11 true # select
16/06/2018 1 12 true # select
17/06/2018 1 13 true # select
19/06/2018 2 15 false # select
20/06/2018 1 16 true # select
21/06/2018 2 18 false # do not select
Then, I would ask for the periods of at least 10 "true" or 9 "true" and 1 "false" (but only one!), but I am not sure how to code this in R. And I have to consider as well if it is the same "cumsum" sequence or if it is a new one, which would be reflected in the "NA". Any idea?
Any help would be appreciated!
Aucun commentaire:
Enregistrer un commentaire