jeudi 30 mai 2019

How to efficiently check whether a vector of numbers is in an interval defined by a data frame

I have the following problem: I have one vector n1 which contains certain values (as an example I randomized the values in the code). I have a data frame df.int which contains one column of upper limits for intervals and one column of certain values (randomized it again, in reality the values are modes of something else). I want to check for each entry of n1 in which interval of the data frame it is and then overwrite the value of n1 with the value of the second column of the respective interval.

In general, my code should work but as n1 and the intervals are quite long, my script runs too long. So I want to ask how I could adjust my code such that it works more efficiently.

Here is the code:

set.seed(123)
seq.vec <- c(seq(400,800000,by=200))
n1 <- sample(100:800000, 2000, replace=TRUE)
df.int <- data.frame(matrix( nrow=length(seq.vec), ncol=2))
df.names <- c("Upper.Limit", "Value")
colnames(df.int) <- df.names
df.int$Upper.Limit <- seq.vec
df.int$Value <- sample(100:800000, length(seq.vec), replace=TRUE)
j <- 1
m <- 1
for (k in seq_len(n1)){
  for (i in seq_len(df.int$Upper.Limit)){
    if (j==1) {
      n1[m] <- ifelse(n1<=df.int$Upper.Limit[j],df.int$Value[j],n1[m])
    } else{
      n1[m] <- ifelse(n1<=df.int$Upper.Limit[j] & n1>df.int$Upper.Limit[j-1]
                            ,df.int$Value[j],n1[m])
    }
    j <- j+1
  }
  m <- m+1
}

Thanks!

Aucun commentaire:

Enregistrer un commentaire