lundi 7 mai 2018

Replacing missing values with the mean of the observed values [duplicate]

This question already has an answer here:

I have a dataframe with 3 variables (subjects (75), environment (14), rating (0-100)) and 2100 observations.

Some ratings are missing because the subjects didn't answer. I would like to go through my ratings column and replace the missing values with the mean of the observed ratings in the corresponding environment.

For example, if a subject didn't give a rating for environment 2, then I would like to replace his missing value with the mean of all the observed ratings for environment 2.

I'm quite new to R, but this is the code that I came up with:

for (j in length(MyData$Rating)) {
  for (i in env.list) {
    if (is.na(MyData$Rating[j]) && MyData$Environment[j] == i) {
      MyData$Rating[j] = mean(MyData.obs$Rating[which(MyData.obs$Environment == i)])
        }
     }
}

env.list is a list with the names of the different environments, and MyData.obs is a dataframe where I just removed all missing values.

Whereas my individual lines of code seem to be working, and I don't get any errors, the MyData dataframe does not seem to get updated. So running the code doesn't seem to do anything.

So my question is; Why is my dataframe not being updated? And, if you have other coding suggestions for how to replace my missing values I would be glad to hear those as well!

Thanks!

Aucun commentaire:

Enregistrer un commentaire