This question already has an answer here:
I have a dataframe with 3 variables (subjects (75), environment (14), rating (0-100)) and 2100 observations.
Some ratings are missing because the subjects didn't answer. I would like to go through my ratings column and replace the missing values with the mean of the observed ratings in the corresponding environment.
For example, if a subject didn't give a rating for environment 2, then I would like to replace his missing value with the mean of all the observed ratings for environment 2.
I'm quite new to R, but this is the code that I came up with:
for (j in length(MyData$Rating)) {
for (i in env.list) {
if (is.na(MyData$Rating[j]) && MyData$Environment[j] == i) {
MyData$Rating[j] = mean(MyData.obs$Rating[which(MyData.obs$Environment == i)])
}
}
}
env.list is a list with the names of the different environments, and MyData.obs is a dataframe where I just removed all missing values.
Whereas my individual lines of code seem to be working, and I don't get any errors, the MyData dataframe does not seem to get updated. So running the code doesn't seem to do anything.
So my question is; Why is my dataframe not being updated? And, if you have other coding suggestions for how to replace my missing values I would be glad to hear those as well!
Thanks!
Aucun commentaire:
Enregistrer un commentaire