I am trying to replace the NA values in a column based conditionally on the size of their group_by with the median value of the group for a large data set.
set.seed(10000)
Data <- data.frame(
X = c(NA,2,3,4,5,6,7,8,9,NA),
Y = c("yes","yes","yes","yes","yes","yes","yes","yes","yes","no"),
Z = c(T,F,F,F,F,F,F,F,F,T)
)
# change NA in the 10 spot to 10
Data <- Data %>%
# group by Y then
group_by(Y) %>%
# if the size of the group is less than 2 and if X is NA change it to 10
# else leave it as X else (if group size less 1) leave value as NA then
mutate(X = ifelse(n<2,ifelse(is.na(X),10,X),NA))
# change NA in 1 spot to 1
Data <- Data %>%
# group by Y and Z then
group_by(Y,Z) %>%
# if the size of the group is larger than 2 and if X is NA change it to 1
# else leave is as X else(if group size 3 or larger) leave value as X
mutate(X = ifelse(n<3,ifelse(is.na(X),1,X),X))
Resulting in error:
Error in n > 1 :
comparison (6) is possible only for atomic and list types
I am expecting column X to be the sequence of 1:10 after the above code.
This is a generalization of a problem I am having with a large data set where I am trying to impute NA values as the median of different group bys conditional on the size of the group and I am getting the same error as above.
Aucun commentaire:
Enregistrer un commentaire