My real dataset is an xts with 4 columns x 110000 lines with signal output values. What I would like to do is to remove some of those value based on a somewhat arbitrary criteria.
Taking the sample_matrix
dataset from xts
as an example, my code looks like this:
require(xts)
require(zoo)
data("sample_matrix")
myxts <- as.xts(sample_matrix)
for (colonne in 1:ncol(myxts)) {
for (i in 2:(nrow(myxts))) {
if (i < 11) {
j = i-1
k = 10
}else{
if (i > nrow(myxts)-10){
j = 10
k = nrow(myxts)-i
}else{
j = 10
k = 10
}
}
if (myxts[i,colonne] > mean(myxts[i-j:i+k,colonne])+5*sd(myxts[i-j:i+k,colonne])) {
myxts[i,colonne] <- NA
myxts<- na.approx(myxts)
}}}
What I'm doing is removing any data that is superior to the mean + 5x standard deviation of the 20 adjacent values. This code runs but it is slow and most likely not optimised.
The 2 if
are to avoid calculating themean
and sd
with data subscript out of bond
.
I want to reduce the code using rollmean
and rollapply
but I have no idea how to do it.
So far this is what I think it should look like:
for (i in 1:nrow(myxts)) {
if (myxts[i,] > rollmean(myxts[i,],k=20)+5*rollapply(myxts[i,],width = 20,FUN =sd)) {
myxts[i,] <- NA
myxts<- na.approx(myxts)
}}
But this leads to Error in rollapply.xts(x, k, FUN = (mean), fill = fill, align = align, : width <= nr is not TRUE
I don't know how to make the rollmean
"follow" i
.
Any help is welcome !
Aucun commentaire:
Enregistrer un commentaire