mardi 25 avril 2017

Creating a Sequence with nested ifelse, depending on different Value ranges and NAs, will give wrong result

I have a dataframe like this:

time  Value  Seq.Count
   1      0          0
   2      0          0
   3      3          0
   4      4          0 
   5      10         0
   6      10         0
   7      10         0
   8      7          0
   9      6          0
  10      NA         0
  11      NA         0
  12      NA         0
  13      0          0
  14      0          0
  15      0          0

Now I want the "Seq.Count" col to count one up, every time the number X in the "Value" col changes between any of the following

0 == X, 0 < X > 10, X == 10, X == NA

So i want to get something like the following in the end:

time  Value  Seq.Count
   1      0          0
   2      0          0
   3      3          1
   4      4          1 
   5      10         2
   6      10         2
   7      10         2
   8      7          3
   9      6          3
  10      NA         4
  11      NA         4
  12      NA         4
  13      0          5
  14      0          5
  15      0          5

I wrote this code:

for (i in 2:nrow(df)) {
  df$Seq.Count[i] <-  ifelse(df$Value[i] == 10,                                                                                 
                                ifelse(df$Value[(i-1)] != 10, df$Seq.Count[i-1]+1, df$Seq.Count[i-1]),                        
                                ifelse(df$Value[i] == 0,                                                                             
                                       ifelse(df$Value[(i-1)] != 0, df$Seq.Count[i-1]+1, df$Seq.Count[i-1]),                   
                                       ifelse(between(df$Value[i], 0.01, 9.99),                                                    
                                              ifelse(df$Value[i-1] == 0 | df$Value[i-1] == 10 | is.na(df$Value[i-1]),   
                                                    df$Seq.Count[i-1]+1,df$Seq.Count[i-1]),                                         
                                              ifelse(is.na(df$Value[i]),                                                             
                                                     ifelse(!is.na(df$Value[i-1]), df$Seq.Count[i-1]+1, df$Seq.Count[i-1]),   
                                                     df$Seq.Count[i-1]                                                                  
                                                     )                                                          
                                              )                                                                                     
                                       ) 
                                )
                           }

Now what this will give me is the following:

time  Value  Seq.Count
   1      0          0
   2      0          0
   3      3          1
   4      4          1 
   5      10         2
   6      10         2
   7      10         2
   8      7          3
   9      6          3
  10      NA         NA
  11      NA         NA
  12      NA         NA
  13      0          NA
  14      0          NA
  15      0          NA

After the first NA occurs in the "Value" col, all following values of the "Seq.Count" col will be NA

Why is this?

according to this line from the code:

    ifelse(is.na(df$Value[i]),
           ifelse(!is.na(df$Value[i-1]), df$Seq.Count[i-1]+1, df$Seq.Count[i-1]), ...

It sould simply take the value from the

Seq.Count[i-1]

and add 1 to it, if

is.na(df$Value[i])

and

!is.na(df$Value[i-1])

Why does this not work?

Thanks for your help.

Aucun commentaire:

Enregistrer un commentaire