mardi 9 février 2021

R - Filter a dataset with conditions and exceptions

I have a dataset with telemetry locations, and I want to select continuous periods of monitoring, that is, with daily data. I want to select periods with two locations per day, for at least 10 days, but I want to allow a one-day break. That is, the 10 days, or more, can include one day-gap, but not more.

First, I extracted from the dataset the days that meet the criteria of the number of locations, and I calculated the difference between days (diff). But now I am struggling to select the days of continuous periods that include the exception of a 1-day gap (which, in the example below, is on day "06/03/2018", which did not have any locations), but not more than that.

An example of the data that should be selected:

date        diff    
23/02/2018  1   # select
24/02/2018  1   # select
25/02/2018  1   # select
26/02/2018  1   # select
27/02/2018  1   # select
28/02/2018  1   # select
01/03/2018  1   # select
02/03/2018  1   # select
03/03/2018  1   # select
04/03/2018  1   # select
05/03/2018  1   # select
07/03/2018  2   # select
08/03/2018  1   # select
09/03/2018  1   # select
10/03/2018  1   # select
11/03/2018  1   # select
13/03/2018  2   # do not select
14/03/2018  1   # do not select
15/03/2018  1   # do not select
16/03/2018  1   # do not select
18/03/2018  2   # do not select
19/03/2018  1   # do not select
05/06/2018  78  # do not select
06/06/2018  1   # select
07/06/2018  1   # select
08/06/2018  1   # select
09/06/2018  1   # select
10/06/2018  1   # select
11/06/2018  1   # select
12/06/2018  1   # select
13/06/2018  1   # select
14/06/2018  1   # select
15/06/2018  1   # select
16/06/2018  1   # select
17/06/2018  1   # select
19/06/2018  2   # select
20/06/2018  1   # select
21/06/2018  2   # do not select

I thought of creating a column with cumsum() and an if statement, where the cumulative sum would re-start each time diff >=3. And then, check if the periods of data are continuous (true/false).

An example:

date       diff cumsum  is_continuos    
23/02/2018  1   1       NA           # select
24/02/2018  1   2       true         # select
25/02/2018  1   3       true         # select
26/02/2018  1   4       true         # select
27/02/2018  1   5       true         # select
28/02/2018  1   6       true         # select
01/03/2018  1   7       true         # select
02/03/2018  1   8       true         # select
03/03/2018  1   9       true         # select
04/03/2018  1   10      true         # select
05/03/2018  1   11      true         # select
07/03/2018  2   13      false        # select
08/03/2018  1   14      true         # select
09/03/2018  1   15      true         # select
10/03/2018  1   16      true         # select
11/03/2018  1   17      true         # select
13/03/2018  2   19      false        # do not select
14/03/2018  1   20      true         # do not select
15/03/2018  1   21      true         # do not select
16/03/2018  1   22      true         # do not select
18/03/2018  2   24      false        # do not select
19/03/2018  1   25      true         # do not select
05/06/2018  78  1       NA           # select
06/06/2018  1   2       true         # select
07/06/2018  1   3       true         # select
08/06/2018  1   4       true         # select
09/06/2018  1   5       true         # select
10/06/2018  1   6       true         # select
11/06/2018  1   7       true         # select
12/06/2018  1   8       true         # select
13/06/2018  1   9       true         # select
14/06/2018  1   10      true         # select
15/06/2018  1   11      true         # select
16/06/2018  1   12      true         # select
17/06/2018  1   13      true         # select
19/06/2018  2   15      false        # select
20/06/2018  1   16      true         # select
21/06/2018  2   18      false        # do not select

Then, I would ask for the periods of at least 10 "true" or 9 "true" and 1 "false" (but only one!), but I am not sure how to code this in R. And I have to consider as well if it is the same "cumsum" sequence or if it is a new one, which would be reflected in the "NA". Any idea?

Any help would be appreciated!

Aucun commentaire:

Enregistrer un commentaire