samedi 18 avril 2015

Why didn't mutate fill all rows? Was using mutate and ifelse to look up imputed values from another dataframe

Here is the deal. Was trying to use mutate from the plyr package to look up an appropriate value from another dataframe, if, the v variable in the original dataframe was NA. The looked up value is supposed to go into a new variable imputed. I also defined a custom function for this look up purpose.


Here is the code:



if(!require(plyr)){
install.packages("plyr")
library(plyr)
}
df = data.frame(d=c(1,1,1,2,2,2,3,3,3),
g=rep(c(1,2,3),3),
v=c(5,NA,NA,5,NA,NA,5,NA,NA))
imputed = data.frame(g=c(1,2,3),
v=c(5,10,15))
getImputed = function(p){
imputed[imputed$g==p,"v"]
}
df = mutate(df,imputed=ifelse(is.na(v),getImputed(g),v))
df


And this is the resulting dataframe:



d g v imputed
1 1 1 5 5
2 1 2 NA 10
3 1 3 NA 15
4 2 1 5 5
5 2 2 NA NA
6 2 3 NA NA
7 3 1 5 5
8 3 2 NA NA
9 3 3 NA NA


As one can see, only the first 3 rows were successfully filled in by mutate. It is likely that the ifelse function is the issue, but I can't see why : (


What is weird is that, if the imputed dataframe has 4 rows, like this:



imputed = data.frame(g=c(1,2,3,4),
v=c(5,10,15,20))


then the df dataframe was filled up properly:



d g v imputed
1 1 1 5 5
2 1 2 NA 10
3 1 3 NA 15
4 2 1 5 5
5 2 2 NA 10
6 2 3 NA 15
7 3 1 5 5
8 3 2 NA 10
9 3 3 NA 15


but R gave me a warning saying:



Warning message:
In imputed$g == p :
longer object length is not a multiple of shorter object length


Am I overlooking something?


Aucun commentaire:

Enregistrer un commentaire