jeudi 23 juin 2016

Ifelse in R with missing variables

1.Using the following columns:

`s1 <- c(1,2,4,2,3,4,2,3)
 s2 <- c(2,3,1,1,4,3,3,5)
 s3 <- c(3,4,2,4,1,2,1,4)
 s5 <- c(4,1,3,3,2,1,4,2)
 s6 <- c(5,5,5,5,5,5,5,1)
 samples <- cbind(s1, s2, s3, s4, s5)
 samples <- data.frame(samples)`

  1. I generate the following code:

    samples$r1<-ifelse(samples$s1==1,"s1", ifelse(samples$s2==1,"s2", ifelse(samples$s3==1,"s3", ifelse(samples$s5==1,"s5",
    ifelse(samples$s6==1,"s6",
    "99") ))))

  2. Which gives me the following result.

s1 s2 s3 s4 s5 r1 1 1 2 3 4 5 s1 2 2 3 4 1 5 s5 3 4 1 2 3 5 s2 4 2 1 4 3 5 s2 5 3 4 1 2 5 s3 6 4 3 2 1 5 s5 7 2 3 1 4 5 s3 8 3 5 4 2 1 s6

  1. So far, so good...

  2. I then add another condition to the code on variable s4....

    samples$r1<-ifelse(samples$s1==1,"s1", ifelse(samples$s2==1,"s2", ifelse(samples$s3==1,"s3", **ifelse(samples$s4==1,"s4",**
    ifelse(samples$s5==1,"s5", ifelse(samples$s6==1,"s6",
    "99") )))))

  3. ...which does not exists in the dataset. I now get the following results

s1 s2 s3 s5 s6 r1 1 1 2 3 4 5 s1 2 2 3 4 1 5 <NA> 3 4 1 2 3 5 s2 4 2 1 4 3 5 s2 5 3 4 1 2 5 s3 6 4 3 2 1 5 <NA> 7 2 3 1 4 5 s3 8 3 5 4 2 1 <NA>

  1. No error message is recoreded, but the presence of a new variable, s4, in the code that does not exist in the data set creates an error in the output as I get r1= NA , when I should have expected an identical output to the one in point 3 above.

The inclusion of a non-existent variable in the code caused this error, and I struggle to find a way to get around it.

In ORACLE SQL I would have used "case when exists" but this option is not possible in the SQLDF package in R.

  1. This is a simplified version of a real life problem where I need to write code that is able to run smoootly even tohugh the input variables will differ from time to time. Hence, while the column S4 was not in this dataset, it may appear in the next dataset I run this code on, so I have to make room for that eventuality.

  2. I have tried to use samples$r1<-ifelse(exists(samples$s1==1,"s1"), ifelse(exists(samples$s2==1,"s2"), ifelse(exists(samples$s3==1,"s3"), **ifelse(exists(samples$s4==1,"s4"),**
    ifelse(exists(samples$s5==1,"s5"), ifelse(exists(samples$s6==1,"s6"),
    "99") )))))
    but this is evidently too simple and does not help me with this problem. I have also searched Stack Overflow in great measures without finding a solution to this problem. '?Exists' in R does not either as far as I can tell provide me with the help I am looking for.

Aucun commentaire:

Enregistrer un commentaire