mercredi 6 janvier 2021

“the condition has length > 1 and only the first element will be used” warning from nested `if elses' over a dataframe

I have a dataframe with two columns, df_headache_tibble:

structure(list(df_questionaire.headaches = c(0L, 2L, 2L, 2L, 
0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 2L, 2L, 2L, 
2L, 0L, 2L, 0L, 2L, 0L, 2L, NA, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 0L, 
0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 
0L, 2L, 0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 
0L, 0L, 2L, 0L, 2L, 0L, 2L, 0L, 0L, 2L, 2L, 0L, 0L, 2L, 2L, 2L, 
0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 
2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 2L, 
0L, 2L, 2L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 2L, 2L, 2L, 0L, 0L, 
0L, 0L, 2L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 2L, 
2L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 2L, 0L, 0L, 
0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 2L, 0L, 0L), df_questionaire.headaches_covid = c(0L, 
0L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 
2L, 2L, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 0L, NA, 2L, 2L, 0L, 0L, 0L, 
2L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 
0L, 2L, 0L, 0L, 2L, 0L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 
0L, 0L, 774L, 0L, 0L, 0L, 2L, 2L, 774L, 0L, 0L, 0L, 2L, 0L, 2L, 
0L, 2L, 0L, 2L, 0L, 0L, 2L, 0L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 
0L, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 
2L, 2L, 0L, 2L, 0L, 2L, 0L, 0L, 2L, 2L, 0L, 2L, 0L, 0L, 0L, 2L, 
2L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 
2L, 0L, 0L, 2L, 2L, 0L, 774L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 
2L, 0L, 2L, 774L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 774L, 0L, 0L, 
774L)), row.names = c(NA, -175L), class = c("tbl_df", "tbl", 
"data.frame"))

I created a function that should return a character vector (Q10_incidence) the same length as nrow(df_headache_tibble), based on nested conditions that should be applied to the dataframe, rowwise. Q10_incidence[i] should be the result of applying the function to the df_headache_tibble[i,1] and df_headache_tibble[i,2], for which I intended to use mapply.

incidence_headaches<-function(x,y){
        if (is.na(x)|is.na(y)){
                        output<-NA
                }
        else if (x==2){
                if (y==2){
                        output<-'previous_headache_maintained'
                }else if(y==0){
                        output<-'previous_headache_ceased'
                }
        }else if(x %in% c(0,774,775,776)){
                if (y==2){
                        output<-'new_onset_headache'
                }else if (y %in% c(0, 774, 775, 776)){
                        output<-'no_headache'
                }
        }
}

Q10_incidence<-mapply(incidence_headaches, Q10_headache_tibble[,1], Q10_headache_tibble[,2])

When I call

mapply(incidence_headaches, Q10_headache_tibble[,1], Q10_headache_tibble[,2])

I get the dreadful "the condition has length > 1 and only the first element will be used" in several warnings. How could I handle this? Although I found several questions about the same "condition has length (...)" warning, I still find this topic quite confusing. A "for dummies" walkthrough is welcomed.

It seems to have something to do with vectorization, and may be solved by substituting the function with a nested ifelse() structure, which could be very messy.

I may need to use similar functions on many occasions, not so sure what is the best workaround.

Aucun commentaire:

Enregistrer un commentaire