dimanche 20 mai 2018

Sum data points in the rows from data frame if they meet criteria from another data frame in R

I have two data frames both with 220 obs and 80 variables. The first data frame, df1, has only the data points 1, 2, and 3. The second data frame, df2, has different numeric values consisting of decimals, such as 0.12, -0.03, 0.01 etc. (supposed to portray market cap weighted stock returns for a given month).

For example

df1 = data.frame(a = c(2, 2, 1), b = c(3, 2, 3), c = c(1, 1, 2), d = c(3, 3, 1))

  a b c d
1 2 3 1 3
2 2 2 1 3
3 1 3 2 1

df2 = data.frame(a = c(0.1, 0.1, 0.2), b = c(0.3, 0.4, 0.6), c = c(0.2, 0.3, 0.5), d = c(0.1, 0.5, 0.6))

    a   b   c   d
1 0.1 0.3 0.2 0.1
2 0.1 0.4 0.3 0.5
3 0.2 0.6 0.5 0.6

How can I sum the rows of df2and turn into a matrix with 220 obs and 3 variables based on the values in df1. Note that df1 and df2 have the same column names in the same order. How can I create a third data frame df3 based on the indicator variables from df1 by summing the rows of df2? I want to sum the rows of df2 based on the values in df1 to create df3:

df3 =
   X1  X2  X3
1 0.2 0.1 0.4
2 0.3 0.5 0.5
3 0.8 0.5 0.6

Let's first look at (X1,1). Row 1 in df1 only contain one data point with value 1, which is (c,1). Thus, we sum row 1 of df2 to get 0.2. Now look at (X1,3) (last value of column X1). Observe row 3 in df1 to find two data points with value 1. In df2 those two values are 0.2 (a,3) and 0.6 (d,3), and sum the values to get 0.8.

Here is the explanation of how df3 looks like:

calculation = data.frame("1" = c("0+0+0.2+0", "0+0+0.3+0", "0.2+0+0+0.6"), "2" = c("0.1+0+0+0", "0.1+0.4+0+0", "0+0+0.5+0"), "3" = c("0+0.3+0+0.1", "0+0+0+0.5", "0+0.6+0+0"))

                 X1                X2                X3
1   0 + 0 + 0.2 + 0   0.1 + 0 + 0 + 0 0 + 0.3 + 0 + 0.1
2   0 + 0 + 0.3 + 0 0.1 + 0.4 + 0 + 0   0 + 0 + 0 + 0.5
3 0.2 + 0 + 0 + 0.6   0 + 0 + 0.5 + 0   0 + 0.6 + 0 + 0

More practical explanation based on stocks. Assume df1 is a matrix that describes buy, hold, and sell recommendations. df2 describes the market weighted stock returns. All variables/columns are different stocks. df3 creates a matrix with three different portfolios. If the stock is "buy", I want to put it in a "buy" portfolio. If the stock is "hold", I want to put it in a "hold" portfolio, etc. This is easily done in Excel with nested IF,AND,OR functions, but I do not know how to do it in R.

Aucun commentaire:

Enregistrer un commentaire