jeudi 1 novembre 2018

Min Normalization by condition

I have a dataset below that has sales data and other data by week broken out by group:

df

  Market  Week Sales Other_data1 Other_data2
1      1     1     5          30         -40
2      1     2     4           7          -8
3      1     3     7         100           9
4      1     4    11          92          50
5      2     1     8           0           8
6      2     2     5           0          14
7      2     3     8           9          98
8      2     4     1           3           3

My goal is to normalize the data two different ways: mean normalization and min normalization. Mean normalization is done to the sales data whereas min normalization is done to the non-sales. I think I have the mean normalization correct but the min normalization is a bit more tricky because I have conditions on the data being selected. Below is what I have currently.

##Function to standardizing variables
group = "Market"
date = "Week"

##Function to standardize sales by dividing by the standard deviation of sales
normalized_mean <- function(x){
  return(x/(sd(x)))
}

##Function to standardize variables by subtracting min
##Used for non-sales data
normalized_min<-function(x){
  out<- ifelse(x>0, ((x-min(x)) / sd(x)),
               ifelse(x<0, ((x+max(x)) / sd(x)), 
                      ifelse(x==0, 0,0)))
  return(out)
}

if (!("Sales" %in% colnames(df))){
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_min)
} else {
  df_index<-df %>% 
    dplyr::group_by(!!sym(group)) %>% 
    dplyr::mutate_at(vars(-one_of(!!group,!!date)), normalized_mean)

}

The current output of this is:

df_index

  Market  Week Sales Other_data1 Other_data2
1      1     1 1.62        0.655     -1.07  
2      1     2 1.29        0.153     -0.213 
3      1     3 2.26        2.18       0.240 
4      1     4 3.55        2.01       1.33  
5      2     1 2.41        0          0.178 
6      2     2 1.51        0          0.311 
7      2     3 2.41        2.12       2.17  
8      2     4 0.302       0.707      0.0666

The output should be this:

  Market  Week Sales Other_data1 Other_data2
1      1     1 1.62        0.501     0.26679  
2      1     2 1.29            0     1.12053
3      1     3 2.26         2.02     1.30729
4      1     4 3.55         1.85     2.40114 
5      2     1 2.41            0     7.93342
6      2     2 1.51            0     13.9334
7      2     3 2.41        2.121     97.9334
8      2     4 0.302       0.707     2.93342

My issue is this formula below.

How do I make the conditions work for this sort of example? It looks like it isn't taking the conditions of x>0, x<0, and x==0 into account.

normalized_min<-function(x){
  out<- ifelse(x>0, ((x-min(x)) / sd(x)),
               ifelse(x<0, ((x+max(x)) / sd(x)), 
                      ifelse(x==0, 0,0)))
  return(out)
}

Any help would be great, thanks!

Aucun commentaire:

Enregistrer un commentaire