mardi 13 juillet 2021

Is there a way to manipulate R dataframe rows based on values in the rows adjacent to them?

I'm looking at a metric for modelled populations which goes up and down somewhat randomly, if it exceeds a set limit for a year or two it's not a problem, but if it exceeds the limit for several years it's a sign that something is going wrong with the population.

I have dataframes where values can be inside (not_out) or outside (out) a set limit, along a time series (t). I want to extract the first instance of t where the value is 'out' for more than a set number of rows, for example 3. E.g.

    set.seed(36)

example.data <- data.frame("t" = c(1:15) , 
"value" = sample(-3:3, 15, replace = TRUE), 
"limit" = 2)

example.data <- mutate(example.data, 
"out_" = ifelse(value >= limit | value <= 0 - limit, "out", "not_out"))

example.data

    t value limit    out_
1   1     1     2 not_out
2   2     2     2     out
3   3    -1     2 not_out
4   4     0     2 not_out
5   5    -3     2     out
6   6     3     2     out
7   7    -2     2     out
8   8     0     2 not_out
9   9     3     2     out
10 10     1     2 not_out
11 11     1     2 not_out
12 12     3     2     out
13 13     3     2     out
14 14     0     2 not_out
15 15     0     2 not_out

So t == 5 would be the first instance where value goes 'out' and stays 'out' for more than 3 rows.

I tried to solve this with a for loop and an if statement along the lines of...

for(t in min(example.data$t) : max(example.data$t)) {
  
  if(example.data$out_ == "out"){
    a <- t
    return(a)
    }

  
}

But I'm struggling to get it to work for a single instance of out_ == "out", and I don't know how to tell R I want it to look at t & t+1...t+n when making the evaluation. Any help would be greatly appreciated.

Aucun commentaire:

Enregistrer un commentaire