samedi 6 juin 2020

In R, for a cell that fulfills multiple string conditions

I am a bit new to R and I am using a sample dataset to get my hands dirty with using ifelse statements. In my Excel spreadsheet, there are numerical values assigned within cells to the category with the column heading "ethnicity" based upon how an individual self-identified. The ethnicity with corresponding code is as such:

1 - Asian
2 - Black or African American
3 - Hispanic or Latino
4 - Native American or American Indian
6 - Native Hawaiian or Pacific Islander
5 - Other (for those who identify with an ethnic group not listed)
7 - Caucasian
8 - Uncertain (for those who are unsure of their ethnicity or what ethnic group they identify with)
9 - Prefer not to answer (chose not to answer)

So, after loading in my file cleaned-demographic-raw_data.csv using read.csv, and assigning it to the variable name "DataAll_Analytical":

DataAll_Analytical <- read.csv(".../Qualtrics_Raw-Clean_2019/cleaned-demographic-raw_data.csv", header = T, na.strings=c("NA"), stringsAsFactors = FALSE)

My goal is to get a column with binary values based upon extracting the above numerical values as strings. A value of "1" for true is assigned for an individual that self-identified with a corresponding ethnic group or false "0" if they did not. My current approach is:

    #Started with 8 and 9 since those were easiest to account for
#returns 1 if the user does not know, 0 for false
    DataAll_Analytical$any_dont_know <- ifelse(is.na(str_extract(DataAll_Analytical,"8"))==T,0,1) 
#returns 1 if the user did not answer, 0 for false
    DataAll_Analytical$no_answer <- ifelse(is.na(str_extract(DataAll_Analytical,"9"))==T,0,1) 
# count for asian only     
    DataAll_Analytical$any_asian <- ifelse(is.na(str_extract(DataAll_Analytical$eth,"1"))==T,0,1) 
# count for black only
    DataAll_Analytical$any_black <- ifelse(is.na(str_extract(DataAll_Analytical$eth,"2"))==T,0,1)  
# count for hispanic only
    DataAll_Analytical$any_hispanic <- ifelse(is.na(str_extract(DataAll_Analytical$eth,"3"))==T,0,1) 
# count for native_american_only
    DataAll_Analytical$any_native_american <- ifelse(is.na(str_extract(DataAll_Analytical$eth,"4"))==T,0,1)
# count for others 
    DataAll_Analytical$any_other <- ifelse(is.na(str_extract(DataAll_Analytical$eth,"5"))==T,0,1)
# count for those who are only hawaiian/pacific islander 
    DataAll_Analytical$any_hawaiian_pacific<- ifelse(is.na(str_extract(DataAll_Analytical$eth,"6"))==T,0,1)
 # count for those who are only white 
    DataAll_Analytical$any_white<- ifelse(is.na(str_extract(DataAll_Analytical$eth,"7"))==T,0,1)

However, I want to account for individuals who identify as multiracial. That means an individual who specified they are Black, Caucasian, and Asian would have the string combination "1,2,7" for that corresponding cell. In the case of individuals who identify as multiracial, I would like to count them as a separate group rather than overlapping with the other predefined ethnic groups.

I thought about using string extraction for this method as well in conjunction with if-else statements, but I am concerned about potential overlap or misidentification. Would there be an approach I could use to sort someone who is multiracial but identifies with Black as one of their ethnicities ("2","3","6") into a "multiracial" category while someone who identifies only as Black ("2") would get sorted into a group for those who identify only as Black?

Aucun commentaire:

Enregistrer un commentaire