mardi 6 août 2019

R: refer to var in different list element on a looped condition

I've got a dataset with a number of vars (t01-t05 in a dummy example but many more in the real dataset). I calculate pred variable as a proportion of target == 1/n() per all group-level combinations (5th element in the ns_by_group_list). However, if the total number of people in that combination (s var) less than 6, I need to use the pred value from the equivalent t01-t04 combination (4th element of ns_by_group_list). If this one is less than 6, then from t01-t03 combinations (3rd element of ns_by_group_list), etc. The final output should look like ns_by_group_list[[5]] but with pred values coming from different ns_by_group_list list elements.

I was thinking of renaming pred and s vars in different list elements to pred1, pred2, .. pred5 and then pulling it all together to one data.frame, then create a long case_when statement... But surely there's a better/more elegant way to do it?

library(tibble)
library(dplyr)
library(purrr)
library(stringr)
library(tidyr)

## functions ####
create_t_labels <- function(n) {
  paste0('t', str_pad(1:n, 2, 'left', '0'))
}
ns_by_group <- function(group_vars) {
  input %>%
    group_by_at(.vars = vars(group_vars)) %>%
    summarise(n = n()) %>%  # total number of people in each group
    ungroup() %>% 
    spread(key = target, value = n) %>%
    mutate(`0` = replace_na(`0`, 0),
           n = replace_na(`1`, 0),
           s = n + `0`,
           pred = round(n/s, 3)
    ) %>%
    select(-c(`1`, `0`)) 
}
### input data #### 
set.seed(1)
input <- tibble(
  target = sample(0:1, 50, replace = TRUE),
  t01 = sample(1:3, 50, replace = TRUE),
  t02 = rep(1:2, each = 25),
  t03 =   rep(1:5, each = 10),
  t04 = rep(1, 50),
  t05 = rep(1:2, each = 25)
)
## calculations ####
group_combo_list <- map(1:5, create_t_labels)
group_combo_list <- map(group_combo_list, function(x) c(x, 'target'))
ns_by_group_list <- map(group_combo_list, ns_by_group)

Aucun commentaire:

Enregistrer un commentaire