mardi 20 décembre 2016

Fixing data based on multiple conditions in a loop in R

Supposing that I have a dataframe, and each column represents seasonal sales from a company.

Now I would like to replace 0s and values that is larger than 3*SD in that column by checking next seasonal value, and if that value is also 0 or an outlier, It would check the next one until all values have been checked.

An example is given below:

A original column from dataframe:

    company_1
1     123
2     0
3     567
4     0
5     987
6     678
7     657
8     567
9     543
10    345
11    2341
12    5432

Because there is only 6 months for sales, so 1-6 is a period and the next 6 is another one.

I've tried to use ifelse with loop and a matrix to achieve the goal. And I used code:

 sd.value <- as.numeric(apply(df,2, function(x) sd(x, na.rm = TRUE)))
 for(i in 1:dim(df)[2]){
        for(j in 1:dim(df)[1]){
          if(df[j,i] == 0 | df[j,i] >= sd.value[i]*3){
            df[j,i] <- matrix(data = df[,i], nrow = 6, ncol = 2)[j,][matrix(data = df[,i], nrow = 6, ncol = 2)[j,]!=0 & matrix(data
 = df[,i], nrow = 6, ncol = 2)[j,] <= sd.value[i]*3][1] 
          } else{
            df[j,i] <- df[j,i]
          }
        }
      }

Basically, R fills all values of a column into a matrix with ncol controls how many years it has. And the first row of that matrix would represents all the first month sales across years. by doing so, R can select an proper value to replace 0 or an outlier.

the goal is to instruct R to check 1st and 2nd and the 3rd ....the 6th values in that matrix while i = 1,2,3....6 AND 7, 8, 9...12 AND (13th,.....18th)..

above is just my thoughts and if you have a better solution, that is great.

Aucun commentaire:

Enregistrer un commentaire