mardi 16 avril 2019

R,dplyr: How to replace NA values based conditional on size of group_by

I am trying to replace the NA values in a column based conditionally on the size of their group_by with the median value of the group for a large data set.

set.seed(10000)
Data <- data.frame(
    X = c(NA,2,3,4,5,6,7,8,9,NA),
    Y = c("yes","yes","yes","yes","yes","yes","yes","yes","yes","no"),
    Z = c(T,F,F,F,F,F,F,F,F,T)
)

# change NA in the 10 spot to 10
Data <- Data %>%
    # group by Y then
    group_by(Y) %>%
    # if the size of the group is less than 2 and if X is NA change it to 10
    # else leave it as X else (if group size less 1) leave value as NA then
    mutate(X = ifelse(n<2,ifelse(is.na(X),10,X),NA)) 

# change NA in 1 spot to 1
Data <- Data %>%
    # group by Y and Z then
    group_by(Y,Z) %>%
    # if the size of the group is larger than 2 and if X is NA change it to 1
    # else leave is as X else(if group size 3 or larger) leave value as X
    mutate(X = ifelse(n<3,ifelse(is.na(X),1,X),X))

Resulting in error:

Error in n > 1 :

comparison (6) is possible only for atomic and list types

I am expecting column X to be the sequence of 1:10 after the above code.

This is a generalization of a problem I am having with a large data set where I am trying to impute NA values as the median of different group bys conditional on the size of the group and I am getting the same error as above.

Aucun commentaire:

Enregistrer un commentaire