I've got a dataset with a number of vars (t01-t05 in a dummy example but many more in the real dataset). I calculate pred variable as a proportion of target == 1/n() per all group-level combinations (5th element in the ns_by_group_list). However, if the total number of people in that combination (s var) less than 6, I need to use the pred value from the equivalent t01-t04 combination (4th element of ns_by_group_list). If this one is less than 6, then from t01-t03 combinations (3rd element of ns_by_group_list), etc. The final output should look like ns_by_group_list[[5]] but with pred values coming from different ns_by_group_list list elements.
I was thinking of renaming pred and s vars in different list elements to pred1, pred2, .. pred5 and then pulling it all together to one data.frame, then create a long case_when statement... But surely there's a better/more elegant way to do it?
library(tibble)
library(dplyr)
library(purrr)
library(stringr)
library(tidyr)
## functions ####
create_t_labels <- function(n) {
paste0('t', str_pad(1:n, 2, 'left', '0'))
}
ns_by_group <- function(group_vars) {
input %>%
group_by_at(.vars = vars(group_vars)) %>%
summarise(n = n()) %>% # total number of people in each group
ungroup() %>%
spread(key = target, value = n) %>%
mutate(`0` = replace_na(`0`, 0),
n = replace_na(`1`, 0),
s = n + `0`,
pred = round(n/s, 3)
) %>%
select(-c(`1`, `0`))
}
### input data ####
set.seed(1)
input <- tibble(
target = sample(0:1, 50, replace = TRUE),
t01 = sample(1:3, 50, replace = TRUE),
t02 = rep(1:2, each = 25),
t03 = rep(1:5, each = 10),
t04 = rep(1, 50),
t05 = rep(1:2, each = 25)
)
## calculations ####
group_combo_list <- map(1:5, create_t_labels)
group_combo_list <- map(group_combo_list, function(x) c(x, 'target'))
ns_by_group_list <- map(group_combo_list, ns_by_group)
Aucun commentaire:
Enregistrer un commentaire