Here is the deal. Was trying to use mutate from the plyr package to look up an appropriate value from another dataframe, if, the v variable in the original dataframe was NA. The looked up value is supposed to go into a new variable imputed. I also defined a custom function for this look up purpose.
Here is the code:
if(!require(plyr)){
install.packages("plyr")
library(plyr)
}
df = data.frame(d=c(1,1,1,2,2,2,3,3,3),
g=rep(c(1,2,3),3),
v=c(5,NA,NA,5,NA,NA,5,NA,NA))
imputed = data.frame(g=c(1,2,3),
v=c(5,10,15))
getImputed = function(p){
imputed[imputed$g==p,"v"]
}
df = mutate(df,imputed=ifelse(is.na(v),getImputed(g),v))
df
And this is the resulting dataframe:
d g v imputed
1 1 1 5 5
2 1 2 NA 10
3 1 3 NA 15
4 2 1 5 5
5 2 2 NA NA
6 2 3 NA NA
7 3 1 5 5
8 3 2 NA NA
9 3 3 NA NA
As one can see, only the first 3 rows were successfully filled in by mutate. It is likely that the ifelse function is the issue, but I can't see why : (
What is weird is that, if the imputed dataframe has 4 rows, like this:
imputed = data.frame(g=c(1,2,3,4),
v=c(5,10,15,20))
then the df dataframe was filled up properly:
d g v imputed
1 1 1 5 5
2 1 2 NA 10
3 1 3 NA 15
4 2 1 5 5
5 2 2 NA 10
6 2 3 NA 15
7 3 1 5 5
8 3 2 NA 10
9 3 3 NA 15
but R gave me a warning saying:
Warning message:
In imputed$g == p :
longer object length is not a multiple of shorter object length
Am I overlooking something?
Aucun commentaire:
Enregistrer un commentaire