I've just transitioned to using R from SAS and I'm working with a very large data set (half a million observations and 20 thousand variables) that needs quite a bit of recoding. I imagine this is a pretty basic question, but I'm still learning so I'd really appreciate any guidance!
Many of the variables have three instances and each instance has multiple arrays. For this problem, I am using the "History of Father's Illness." There are many illnesses included, but I am primarily interested in CAD (coded as "1").
An example of how the data looks:
n_20107_0_0 n_20107_0_1 n_20107_0_2
NA NA NA
7 1 8
4 6 1
I've only included 3 arrays here, but in reality there are close to 20. I did a bit of research and determined that the most efficient way to do this would be to create a list with the variables and then use lapply. This is what I have attempted:
FatherDisease1 <- paste("n_20107_0_", 0:3, sep = "")
lapply(FatherDisease1, transform, FatherCAD_0_0 = ifelse(FatherDisease1 == 1, 1, 0))
I don't quite get the results I am looking for when I do this.
n_20107_0_0 n_20107_0_1 n_20107_0_2 FatherCAD_0_0
NA NA NA 0
7 1 8 0
4 6 1 0
What I would like to do is go through all of the 3 instances and if the person had answered 1, then for "FatherCAD_0_0" to equal 1, if not then "FatherCAD_0_0" equals 0, but I only ever end up with 0's. As for the NA's I would like for them to stay as NAs. This is what I would like it to look like:
n_20107_0_0 n_20107_0_1 n_20107_0_2 FatherCAD_0_0
NA NA NA NA
7 1 8 1
4 6 1 1
I've figured out how to do this the "long" way (30+ lines of code -_-) but am trying to get better at writing more elegant and efficient code. Any help would be greatly appreciated!!
Aucun commentaire:
Enregistrer un commentaire