mercredi 29 mai 2019

Create a binary column based on one condition across multiple columns [duplicate]

This question already has an answer here:

I have exported Survey Monkey data which, for each question, produces a separate column for each option and fills it with a character value if the respondent selected this response, otherwise it's NA (see df below).

I would like to create a new binary column based on the same condition across multiple columns.

diag <- structure(list(diag_stress_fracture = c(NA, "Stress 
fracture(s)", 
NA, NA, NA, NA), diag_disordered_eating = c(NA_character_, 
NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
diag_asthma = c(NA, "Asthma", NA, NA, NA, NA), 
diag_low_bone_density = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_), diag_acl_rupture = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), diag_concussion = c(NA, "Concussion", NA, NA, NA, NA), 
diag_depression_or_anxiety = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), diag_haemochromatosis = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), diag_hypothyroidism = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), diag_oligomenorrhea_or_amenorrhoea = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_)), .Names = c("diag_stress_fracture", 
"diag_disordered_eating", 
"diag_asthma", "diag_low_bone_density", "diag_acl_rupture", 
"diag_concussion", 
"diag_depression_or_anxiety", "diag_haemochromatosis", 
"diag_hypothyroidism", 
"diag_oligomenorrhea_or_amenorrhoea"), row.names = c(NA, 6L), class 
= "data.frame")`

Essentially I want to know if a participant has had a diagnosis, regardless of what it is. I can get my desired outcome using the following code (where ... are the above columns of interest but I have truncated for this example):

diag <- diag %>%
mutate(diag.yn = ifelse(!is.na(diag_stress_fracture) |
!is.na(diag_disordered_eating) | 
!is.na(diag_asthma) | ... , 1, 0)

However this is obviously very clunky and time consuming given that I would like to do this for multiple questions. Is there a way of doing this using column positions e.g. these are 38:47 in my large data set?

Aucun commentaire:

Enregistrer un commentaire