vendredi 3 juillet 2015

Simplify nested for loops with if/else statements in R - mapply?

I have the following mess of code, which works for what I want to accomplish. However, in reading about R, I'm seeing over and over again that for loops are incredibly slow, and to avoid them whenever possible.

My actual datasets that I need to apply this code to contain 2,000,000+ data points, so speed is a significant concern. I've been reading up on mapply, but I am brand new to coding, and really unsure of how to make it work, since I have if/else statements within my for loops.

Moreover, if I understand correctly, mapply input has to have a function with the same number of variables as the number of arguments you input? My function sum.bin only has one variable, but I want to apply this over 3 sets of data - length.feature, num.bins, and length.data. Any suggestions?

for(x in 1:length.feature){

  for(y in 1:num.bins){

    sum.bin <- 0

    count <- 0

    bin.start <- feature.bins[x,y]

    bin.end <- feature.bins[x,(y + 1)]

    for(i in j:length.data){

      if(data.arm[i] == feature.arm[x]){

        if(data.position[i] < bin.start){next}

        if(data.position[i] > bin.end){break}

        sum.bin <- sum.bin + data.value[i]

        count <- count + 1

        z <- i

      }

         else{next}

    }

    j <- z - count

    if(j < 1){j <- 1}

    feature.value[n] <- sum.bin

    n <- n + 1

  }

}

My input data is essentially 3 data frames, which in the following code, I break apart into smaller pieces (e.g. length.feature <- dim(feature)[1] to work with.

data <- data.frame[2772122,3]
feature <- data.frame[8538, 6]
feature.bins <- data.frame[8538,101]

The output that I'm looking for is a matrix or data frame 8538 rows by 100 columns, containing the results of the sum.bin function.

Aucun commentaire:

Enregistrer un commentaire