mardi 28 mars 2017

ifelse dplyr showing wrong output

I want to create a new column which selects the minimum value of three possible columns and then use add or subtract depending on condition.

I have the next data frame called df:

     a    b    c
1  0.60 0.27 0.14
2  0.48 0.32 0.21
3  0.42 0.24 0.35
4  0.28 0.33 0.41
5  0.52 0.28 0.22
6  0.34 0.30 0.37
7  0.38 0.28 0.35
8  0.34 0.28 0.40
9  0.53 0.26 0.22
10 0.17 0.27 0.58
11 0.34 0.35 0.33
12 0.19 0.27 0.56
13 0.56 0.29 0.17
14 0.55 0.28 0.19
15 0.29 0.24 0.48
16 0.23 0.31 0.47
17 0.40 0.32 0.28
18 0.50 0.27 0.24
19 0.45 0.28 0.27
20 0.68 0.26 0.05
21 0.40 0.32 0.28
22 0.23 0.26 0.50
23 0.46 0.33 0.20
24 0.46 0.24 0.28
25 0.44 0.24 0.31
26 0.46 0.26 0.27
27 0.30 0.29 0.40
28 0.45 0.20 0.34
29 0.53 0.27 0.20
30 0.33 0.34 0.33
31 0.20 0.26 0.55
32 0.65 0.29 0.06
33 0.45 0.24 0.32
34 0.30 0.26 0.45
35 0.20 0.36 0.45
36 0.38 0.16 0.38

Every row must sum to 1, but as you can notice, just some of them satisfy that condition.

df_total <- rowSums(df[c("a", "b", "c")])
print(df_total)
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19 
1.01 1.01 1.01 1.02 1.02 1.01 1.01 1.02 1.01 1.02 1.02 1.02 1.02 1.02 1.01 1.01 1.00 1.01 1.00 
  20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36 
0.99 1.00 0.99 0.99 0.98 0.99 0.99 0.99 0.99 1.00 1.00 1.01 1.00 1.01 1.01 1.01 0.92

So for example in row number 36 from df, I need to sum the lowest value (Which is 0.16) with a number that will make a, b and c sum to 1.

I guess there's an easier way to do this, but I have done this code so far and it doesn't work...Why?

df_total <- rowSums(df[c("a", "b", "c")])

df_for_sum <- df_total[df_total > 1] - 1  #The ones which are above 1
df_for_minus <- -(df_total[df_total < 1]) + 1  #The ones which are below 1 
equal_to_100 <- df_total[df_total == 1]  #The ones which are ok

df <- df %>%
  mutate(d = ifelse(rowSums(df[c("a","b","c")]) > 1,
                            apply(df[rowSums(df[c("a","b","c")]) > 1,], 1, min) - df_for_sum,
                    ifelse(rowSums(df[c("a","b","c")]) < 1,
                           apply(df[rowSums(df[c("a","b","c")]) < 1,], 1, min) + df_for_minus,
                           ifelse(rowSums(df[c("a","b","c")]) == 1,
                                  apply(df[rowSums(df[c("a","b","c")]) == 1,], 1, min), ""))))

And this is the output:

      a    b    c                  d
1  0.60 0.27 0.14               0.13
2  0.48 0.32 0.21                0.2
3  0.42 0.24 0.35               0.23
4  0.28 0.33 0.41               0.26
5  0.52 0.28 0.22                0.2
6  0.34 0.30 0.37               0.29
7  0.38 0.28 0.35               0.27
8  0.34 0.28 0.40               0.26
9  0.53 0.26 0.22               0.21
10 0.17 0.27 0.58               0.15
11 0.34 0.35 0.33               0.31
12 0.19 0.27 0.56               0.17
13 0.56 0.29 0.17               0.15
14 0.55 0.28 0.19               0.17
15 0.29 0.24 0.48               0.23
16 0.23 0.31 0.47               0.22
17 0.40 0.32 0.28               0.33  #From here til the end it's wrong!
18 0.50 0.27 0.24               0.19
19 0.45 0.28 0.27               0.28
20 0.68 0.26 0.05               0.24
21 0.40 0.32 0.28               0.28
22 0.23 0.26 0.50               0.26
23 0.46 0.33 0.20               0.25
24 0.46 0.24 0.28               0.27
25 0.44 0.24 0.31                0.3
26 0.46 0.26 0.27               0.21
27 0.30 0.29 0.40               0.24
28 0.45 0.20 0.34 0.0599999999999999
29 0.53 0.27 0.20               0.33
30 0.33 0.34 0.33               0.06
31 0.20 0.26 0.55               0.15
32 0.65 0.29 0.06               0.27
33 0.45 0.24 0.32               0.17
34 0.30 0.26 0.45               0.15
35 0.20 0.36 0.45               0.17
36 0.38 0.16 0.38               0.24

Any thoughts? Any easier way?

Aucun commentaire:

Enregistrer un commentaire