dimanche 17 janvier 2021

If value is equal to NA, then estimate based on share in R

Consider the two following data frames:

df  <- data.frame(REGION   = c("REG01","REG02","REG03","REGSUM"),
                  INDUSTRY = c("INDU01","INDU01","INDU01","INDU01"),
                  VALUE    = c(NA,10,NA,30))

and:

df2 <- data.frame(REGION   = c("REG01","REG02","REG03","REGSUM"),
                  INDUSTRY = c("INDU01","INDU01","INDU01","INDU01"),
                  VALUE    = c(5,15,20,40))

I want to do the following calculation: If the value is equal to NA in df, then I want to estimate it based on the shares from df2. Because I know the sum in df, I know that I have to distribute the value df[REGSUM,INDU01] - df[REG02,INDU01] = 30 - 10 = 20 between the two elements with NA in df.

Then it should divide the same elements in df2 with the sum of the elements with NA:

df2_share[REG01,INDU01] = 5  / (5 + 20) = 0.2
df2_share[REG03,INDU01] = 20 / (5 + 20) = 0.8

This shares should be used to estimate the NA in df1. So I will end up with the following data frame:

    REGION  INDUSTRY   VALUE
1   REG01   INDU01     0.2 * 20 = 4 
2   REG02   INDU01     10   
3   REG03   INDU01     0.8 * 20 = 16    
4   REGSUM  INDU01     30

Can I do that in R (I have a lot of regions and industries in my data frame).

Aucun commentaire:

Enregistrer un commentaire