dimanche 22 août 2021

How do I use the result of sequential filters to identify larger groups

This question is continuation from this question.

Here is my original example code. I am attempting to first identify all the groups in my larger dataset in which there is exactly one case where x = "Yes" and y is the minimum of all x = "Yes" (there may be multiple x = "Yes" in a given group).

Ideally I'd like to find a better way to manage this for other cases as well where multiple scenarios arise that need to be treated differently.

structure(list(type = c(7345L, 7345L, 7345L, 7345L, 7345L, 
7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 
7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 7345L, 7345L
), x= structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("No", "Yes"), class = "factor"), y = c(1.66703903751618, 
0, 0.899002060282742, 1.77844476717205, 0.858205995526113, 1.77844476717205, 
0.894654725714929, 2.28497216539696, 0, 0.899002060282742, 2.28497216539696, 
2.85895315127563, 2.85895315127563, 0, 2.85895315127563, 0.858205995526113, 
0.894654725714929, 1.66703903751618, 1.66703903751618, 0, 0, 
1.66703903751618, 0.894654725714929), z = c(6.67, 
0, 3.33, 6.67, 3.33, 6.67, 2, 6.67, 3.33, 3.33, 2, 3.33, 3.33, 
2, 3.33, 6.67, 6.67, 6.67, 2, 6.67, 3.33, 6.67, 2)), row.names = c(NA, 
-23L), class = c("tbl_df", "tbl", "data.frame"))

# And the code that I attempted:

test <- test %>%
          group_by(type) %>%
          arrange(type) %>%
          filter(sum(y == min(y)  & x == "Yes") == 1) %>%
          ungroup()

    test <- test %>%
    group_by(type) %>%
      mutate(x = case_when(y == min(y)  & x == "Yes" ~ "Yes",
                           TRUE ~ "No"))

Basically I am trying to assign a "Yes" to just one x. If there is more than one, the tie is broken by y. If there is still a tie, the tie is broken by z. And so on, I hope. The solution to my original question helped me identify the correct row from the group, but as a side effect the row gets separated from the group. This stops me from doing the next step: assigning x = "No" to all rows where the sequential conditions/filters don't apply.

This is a part of a longer code where I'm trying to determine which is x = "Yes" for each group.The preceding step (not pasted here) gathers all the type groups where there is more than one x into the test (sub)dataframe.

Originally I tried making a long chain of ifelse() statements that got more specific after each fork, resulting in multiple end nodes from where I could then assign the correct x = "Yes". i.e.

  • if x = "Yes" > 1 in the group
    • y = min(y) -> x = "Yes" (end)
    • if multiple y = min(y)
      • z = max(z) -> x = "Yes" (end)
      • if multiple z = max(z)
        • etc.

Also tried solving this with case_when()'s and if()'s, but it quickly became unmanageable and didn't work.

test <- test %>%
          group_by(type) %>%
          arrange(type) %>%
          filter(x == "Yes") %>%
             filter(y == min(y)) %>%
               ungroup()

    test <- test %>%
    group_by(type) %>%
      mutate(x = case_when(y == min(y)  & x == "Yes" ~ "Yes",
                           TRUE ~ "No"))

Aucun commentaire:

Enregistrer un commentaire