This question already has an answer here:
I have exported Survey Monkey data which, for each question, produces a separate column for each option and fills it with a character value if the respondent selected this response, otherwise it's NA
(see df below).
I would like to create a new binary column based on the same condition across multiple columns.
diag <- structure(list(diag_stress_fracture = c(NA, "Stress
fracture(s)",
NA, NA, NA, NA), diag_disordered_eating = c(NA_character_,
NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
diag_asthma = c(NA, "Asthma", NA, NA, NA, NA),
diag_low_bone_density = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), diag_acl_rupture = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), diag_concussion = c(NA, "Concussion", NA, NA, NA, NA),
diag_depression_or_anxiety = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), diag_haemochromatosis = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), diag_hypothyroidism = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), diag_oligomenorrhea_or_amenorrhoea = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), .Names = c("diag_stress_fracture",
"diag_disordered_eating",
"diag_asthma", "diag_low_bone_density", "diag_acl_rupture",
"diag_concussion",
"diag_depression_or_anxiety", "diag_haemochromatosis",
"diag_hypothyroidism",
"diag_oligomenorrhea_or_amenorrhoea"), row.names = c(NA, 6L), class
= "data.frame")`
Essentially I want to know if a participant has had a diagnosis, regardless of what it is. I can get my desired outcome using the following code (where ...
are the above columns of interest but I have truncated for this example):
diag <- diag %>%
mutate(diag.yn = ifelse(!is.na(diag_stress_fracture) |
!is.na(diag_disordered_eating) |
!is.na(diag_asthma) | ... , 1, 0)
However this is obviously very clunky and time consuming given that I would like to do this for multiple questions. Is there a way of doing this using column positions e.g. these are 38:47 in my large data set?
Aucun commentaire:
Enregistrer un commentaire