samedi 30 novembre 2019

Conditional replacement while match on a variable

I want to replace the NA values for observations within a particular sub-group, but the sequence of the observations in that group is not ordered properly. So I am wondering if there exists some dplyr or plyr command that would allow me to replace missing values in a column belonging to one dataframe using the values from the same column from another dataframe while matching on the values of that "key" column.

Here's what I got. Hope someone could shed light on this. Thanks.

## data frame that contains missing values in "diff" column

df <- data.frame(type = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3), 
diff = c(0.1, 0.3, NA, NA, NA, NA, NA, 0.2, 0.7, NA, 0.5, NA), 
name = c("A", "B", "C", "D", "E", "A", "B", "C", "F", "A", "B", "C"))

## replace with values from this smaller data frame

df2 <- data.frame(diff_rep = c(0.3, 0.2, 0.4), name = c("A", "B", "C"))

## replace using ifelse
df$diff <- ifelse(is.na(df$diff) & (df$type == 2), df2$diff_rep , df$diff)

df

   type diff name
1     1  0.1    A
2     1  0.3    B
3     1   NA    C
4     2  0.3    D
5     2  0.2    E
6     2  0.4    A
7     2  0.3    B
8     2  0.2    C
9     2  0.7    F
10    3   NA    A
11    3  0.5    B
12    3   NA    C

## desired output

   type diff name
1     1  0.1    A
2     1  0.3    B
3     1   NA    C
4     2   NA    D
5     2   NA    E
6     2  0.3    A
7     2  0.2    B
8     2  0.4    C
9     2   NA    F
10    3   NA    A
11    3  0.5    B
12    3   NA    C

Aucun commentaire:

Enregistrer un commentaire