Sample df:
set.seed(1)
df <- tibble(name = fruit[1:10],
A = rpois(10, 10),
B = rpois(10, 2),
C = rpois(10, 6),
D = rpois(10, 2))
name A B C D A_rank AB_rank ABC_rank ABCD_rank
1 apple 8 1 8 2 9 9 8 7
2 apricot 10 1 7 3 7 7 4 5
3 avocado 7 0 8 0 10 10 9 10
4 banana 11 1 3 2 4 5 9 9
5 bell pepper 14 4 7 3 1 1 1 1
6 bilberry 12 1 5 3 3 3 4 5
7 blackberry 11 2 8 2 4 3 3 3
8 blackcurrant 9 2 7 4 8 7 4 4
9 blood orange 14 2 8 2 1 2 2 2
10 blueberry 11 1 6 1 4 5 4 7
This builds on a question I asked previously, where I wanted to perform row-wise calculations to compute ranks on gradually cumulating sums of each column, where a higher sum = lower rank.
df <- cbind(df, apply(-apply(df[, -1], 1, cumsum), 1, min_rank) %>%
as_tibble() %>%
rename(A_rank = A, AB_rank = B, ABC_rank = C, ABCD_rank = D))
However, what I would like now is to incorporate a custom rules-based tie-breaker function which base R or dplyr doesn't provide. The rules for my tie-breaker function at each rank calculation are:
- The fruit with the highest number of points in the most events wins
- If a tie still remains, then the fruit with the largest number of points in any single column will be given the higher place.
- If the tie still exists, compare the second highest number of points, and so on.
- Else, use min_rank.
So, in my df, looking at the first rank computation, just for A
:
df %>% select(name, A, A_rank) %>% arrange(A_rank)
name A A_rank
1 bell pepper 14 1
2 blood orange 14 1
3 bilberry 12 3
4 banana 11 4
5 blackberry 11 4
6 blueberry 11 4
7 apricot 10 7
8 blackcurrant 9 8
9 apple 8 9
10 avocado 7 10
Here, as we just started with the first rank, the fruits with tied scores use min_rank
, which is fine as there is no more information.
After summing row-wise columns A
and B
:
df %>% select(name, A, B, AB_rank) %>% arrange(AB_rank)
name A B AB_rank
1 bell pepper 14 4 1
2 blood orange 14 2 2
3 bilberry 12 1 3
4 blackberry 11 2 3
5 banana 11 1 5
6 blueberry 11 1 5
7 apricot 10 1 7
8 blackcurrant 9 2 7
9 apple 8 1 9
10 avocado 7 0 10
Here, for fruits bilberry
and blackberry
, they each have one column where they have a higher number than the other fruit, so a tie still remains and I want to move on to the second rule, where bilberry
will rank 3
as they have the higher number 12 in the A
col, while blackberry
goes to rank 4
.
For banana
and blueberry
, because a tie would still remain after applying my two rules, use min_rank, which is fine here.
Expected output
name A B AB_rank
1 bell pepper 14 4 1
2 blood orange 14 2 2
3 bilberry 12 1 3
4 blackberry 11 2 4
5 banana 11 1 5
6 blueberry 11 1 5
7 apricot 10 1 7
8 blackcurrant 9 2 8
9 apple 8 1 9
10 avocado 7 0 10
Now, using the sums of A
, B
, C
:
df %>% select(name, A, B, C, ABC_rank) %>% arrange(ABC_rank)
name A B C ABC_rank
1 bell pepper 14 4 7 1
2 blood orange 14 2 8 2
3 blackberry 11 2 8 3
4 apricot 10 1 7 4
5 bilberry 12 1 5 4
6 blackcurrant 9 2 7 4
7 blueberry 11 1 6 4
8 apple 8 1 8 8
9 avocado 7 0 8 9
10 banana 11 1 3 9
Fruits apricot
, bilberry
, blackcurrant
, and blueberry
have the same sum. Applying the first rule, blueberry
becomes rank 7, as they have no number which is the highest in any of the three columns A
, B
, C
. Then, bilberry
will have a rank of 4, as the fruit has the highest figure 12 in A
, then apricot
with rank 5 as it has a figure of 10, then blackcurrant is rank 6.
Looking at avocado
and banana
, banana
would be rank 9, as they have two values which are larger than avacado
in cols A
and B
, while avocado would become rank 10.
Expected output
name A B C ABC_rank
1 bell pepper 14 4 7 1
2 blood orange 14 2 8 2
3 blackberry 11 2 8 3
4 bilberry 12 1 5 4
5 apricot 10 1 7 5
6 blackcurrant 9 2 7 6
7 blueberry 11 1 6 7
8 apple 8 1 8 8
9 banana 11 1 3 9
10 avocado 7 0 8 10
This is quite complex, and I'm not sure what the best solution for tackling this is. Possibly an if else statement?
Aucun commentaire:
Enregistrer un commentaire