Supposing that I have a dataframe, and each column represents seasonal sales from a company.
Now I would like to replace 0s and values that is larger than 3*SD in that column by checking next seasonal value, and if that value is also 0 or an outlier, It would check the next one until all values have been checked.
An example is given below:
A original column from dataframe:
company_1
1 123
2 0
3 567
4 0
5 987
6 678
7 657
8 567
9 543
10 345
11 2341
12 5432
Because there is only 6 months for sales, so 1-6 is a period and the next 6 is another one.
I've tried to use ifelse with loop and a matrix to achieve the goal. And I used code:
sd.value <- as.numeric(apply(df,2, function(x) sd(x, na.rm = TRUE)))
for(i in 1:dim(df)[2]){
for(j in 1:dim(df)[1]){
if(df[j,i] == 0 | df[j,i] >= sd.value[i]*3){
df[j,i] <- matrix(data = df[,i], nrow = 6, ncol = 2)[j,][matrix(data = df[,i], nrow = 6, ncol = 2)[j,]!=0 & matrix(data
= df[,i], nrow = 6, ncol = 2)[j,] <= sd.value[i]*3][1]
} else{
df[j,i] <- df[j,i]
}
}
}
Basically, R fills all values of a column into a matrix with ncol controls how many years it has. And the first row of that matrix would represents all the first month sales across years. by doing so, R can select an proper value to replace 0 or an outlier.
the goal is to instruct R to check 1st and 2nd and the 3rd ....the 6th values in that matrix while i = 1,2,3....6 AND 7, 8, 9...12 AND (13th,.....18th)..
above is just my thoughts and if you have a better solution, that is great.
Aucun commentaire:
Enregistrer un commentaire