If I have a hourly data set with 3 variables (time, a, b) and want to take a look at the standard deviation of "b" on specific days with outliers in "a", how can I do it? So the idea is: If a value of variable "a" is above a certain threshold e.g. 99 as in the following example, what is the standard deviation of variable "b" for the whole day. And what is the sd of "b" the day before and the day after. I try to clarify the problem with an example:
set.seed(1)
df = data.frame("time" = seq(
from = as.POSIXct("2016-05-01 00:00", tz = "Europe/Berlin"),
to = as.POSIXct("2016-05-04 23:00", tz = "Europe/Berlin"),
by = "hour"), "a" = runif(96, min=0, max=100), "b" = runif(96, min=1200,
max=30000))
If this is the data, I would like to write a command like this:
test = data.frame("time" = df$time, "extreme" = ifelse(df$a> 99, sd(#take the sd of "b" for the day where df$a>99 occured) & sd(#and for the day before and after), 0 ))
test = subset(test, test$extreme>0) # to have a data frame with the important values only
I appreciate any help.
Aucun commentaire:
Enregistrer un commentaire