I have a dataset below that has sales data and other data by week broken out by group:
df
Market Week Sales Other_data1 Other_data2
1 1 1 5 30 -40
2 1 2 4 7 -8
3 1 3 7 100 9
4 1 4 11 92 50
5 2 1 8 0 8
6 2 2 5 0 14
7 2 3 8 9 98
8 2 4 1 3 3
My goal is to normalize the data two different ways: mean normalization and min normalization. Mean normalization is done to the sales data whereas min normalization is done to the non-sales. I think I have the mean normalization correct but the min normalization is a bit more tricky because I have conditions on the data being selected. Below is what I have currently.
##Function to standardizing variables
group = "Market"
date = "Week"
##Function to standardize sales by dividing by the standard deviation of sales
normalized_mean <- function(x){
return(x/(sd(x)))
}
##Function to standardize variables by subtracting min
##Used for non-sales data
normalized_min<-function(x){
out<- ifelse(x>0, ((x-min(x)) / sd(x)),
ifelse(x<0, ((x+max(x)) / sd(x)),
ifelse(x==0, 0,0)))
return(out)
}
if (!("Sales" %in% colnames(df))){
df_index<-df %>%
dplyr::group_by(!!sym(group)) %>%
dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_min)
} else {
df_index<-df %>%
dplyr::group_by(!!sym(group)) %>%
dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_mean)
}
The current output of this is:
df_index
Market Week Sales Other_data1 Other_data2
1 1 1 1.62 0.655 -1.07
2 1 2 1.29 0.153 -0.213
3 1 3 2.26 2.18 0.240
4 1 4 3.55 2.01 1.33
5 2 1 2.41 0 0.178
6 2 2 1.51 0 0.311
7 2 3 2.41 2.12 2.17
8 2 4 0.302 0.707 0.0666
The output should be this:
Market Week Sales Other_data1 Other_data2
1 1 1 1.62 0.501 0.26679
2 1 2 1.29 0 1.12053
3 1 3 2.26 2.02 1.30729
4 1 4 3.55 1.85 2.40114
5 2 1 2.41 0 7.93342
6 2 2 1.51 0 13.9334
7 2 3 2.41 2.121 97.9334
8 2 4 0.302 0.707 2.93342
My issue is this formula below.
How do I make the conditions work for this sort of example? It looks like it isn't taking the conditions of x>0, x<0, and x==0 into account.
normalized_min<-function(x){
out<- ifelse(x>0, ((x-min(x)) / sd(x)),
ifelse(x<0, ((x+max(x)) / sd(x)),
ifelse(x==0, 0,0)))
return(out)
}
Any help would be great, thanks!
Aucun commentaire:
Enregistrer un commentaire