lundi 24 juin 2019

Why is R producing hidden duplicate columns when I use an ifelse statement?

For context, I'm trying to determine if someone had an overall increase in score in at least one of five factors assessed in pre/post assessments.

I created five columns of Positive or Not values to determine if a factor score had increased or decreased. There were missing values because some had incomplete pre or post data.

I created a column to determine if there was one factor in the row that was positive using this code: MSWC$Overall <- ifelse(MSWC$Factor1 == "Positive" | MSWC$Factor2 == "Positive" | MSWC$Factor3 == "Positive" | MSWC$Factor4 == "Positive" | MSWC$Factor5 == "Positive", "Positive", "Same/Neg")

Output:

Factor1   Factor2   Factor3   Factor4  Factor5   Overall
Positive  Not       Not       NA       Positive  Positive
Not       NA        NA        Positive Not       Positive
Not       Not       Not       Not      Not       Not
NA        NA        NA        NA       NA        NA
Not       NA        NA        Not      Not       NA

This code was obviously not perfect, as it didn't code rows without positive values, this statement created a column to find rows with all missing values.

MSWC$Meh <- rowSums(ifelse(is.na(MSWC[,16:24]) == FALSE, 1, 0))

This statement fed into a second column to code the values that should be listed as "Not".

MSWC$Outcomess <- ifelse(is.na(MSWC$Overall_Positive) & MSWC$Meh > 0, "Same/Neg", MSWC$Overall_Positive)

This column creates exactly what I needed and fixes the last row to look complete:

Factor1   Factor2   Factor3   Factor4  Factor5   Overall   Outcomess
Positive  Not       Not       NA       Positive  Positive  Positive
Not       NA        NA        Positive Not       Positive  Positive
Not       Not       Not       Not      Not       Not       Not
NA        NA        NA        NA       NA        NA        NA
Not       NA        NA        Not      Not       NA        Not

The problem is, now when I export this data, I get five duplicate "overall" columns and five duplicate "outcomess" columns (one for each factor). They're easy to remove in excel, but don't show up in the environment, so I can't remove them with a MSWC[,-c(5:10)] statement.

Why is this happening with my data?

Aucun commentaire:

Enregistrer un commentaire