Consider the two following data frames:
df <- data.frame(REGION = c("REG01","REG02","REG03","REGSUM"),
INDUSTRY = c("INDU01","INDU01","INDU01","INDU01"),
VALUE = c(NA,10,NA,30))
and:
df2 <- data.frame(REGION = c("REG01","REG02","REG03","REGSUM"),
INDUSTRY = c("INDU01","INDU01","INDU01","INDU01"),
VALUE = c(5,15,20,40))
I want to do the following calculation: If the value is equal to NA in df, then I want to estimate it based on the shares from df2. Because I know the sum in df, I know that I have to distribute the value df[REGSUM,INDU01] - df[REG02,INDU01] = 30 - 10 = 20 between the two elements with NA in df.
Then it should divide the same elements in df2 with the sum of the elements with NA:
df2_share[REG01,INDU01] = 5 / (5 + 20) = 0.2
df2_share[REG03,INDU01] = 20 / (5 + 20) = 0.8
This shares should be used to estimate the NA in df1. So I will end up with the following data frame:
REGION INDUSTRY VALUE
1 REG01 INDU01 0.2 * 20 = 4
2 REG02 INDU01 10
3 REG03 INDU01 0.8 * 20 = 16
4 REGSUM INDU01 30
Can I do that in R (I have a lot of regions and industries in my data frame).
Aucun commentaire:
Enregistrer un commentaire