jeudi 2 mai 2019

Structuring nested if statements with multiple conditions

I have a data table consisting of five columns, and I would like to create a function to calculate a sixth column based on the relationship between the first three numbers (i, j and p) in each row using if statements. My dataset is structured something like this:

[1]: https://i.stack.imgur.com/g2cLa.png [data table with five columns labeled i, j, p, ID, and pair, and 320 rows. columns i, j and p consist of some combination of numbers 1 to 1500, while ID contains all three numbers in a given row separated by commas, and pair contains the i and j numbers in a given row separated by periods.][1]

For example, for rows where j > p > i, the value in the new column would correspond to the minimum distance between j and p, multiplied by 2, divided by the length (which is a value provided within the function - in this case it's the maximum value of the dataset, 1500). There are some cases, such as rows where i > p > j, where different calculations should be performed depending on a further condition.

Because the numbers I'm dealing with lie on a circle, I've also created a function to determine the minimum distance between the two points, as that becomes relevant for certain calculations.

Here are the functions that I've constructed:

min.dist.calc <- function(x, seqlength) { dist <- apply(x, MARGIN = 1, function(x) min(abs(as.numeric(x[[2]]) - as.numeric(x[[3]])), seqlength - abs(as.numeric(x[[2]]) - as.numeric(x[[3]]))))
}

newfunc <- function(x, seqlength){
    if (x$j > x$p & x$p > x$i) {
        dist1 <- min.dist.calc(x, seqlength)
        A1 <- (2*(dist1) / seqlength)
        return(A1)
    } else if (x$i > x$j & x$j > x$p) {
        dist2 <- min.dist.calc(x, seqlength)
        A2 <- (2*(dist2) / seqlength)
        return(A2)
    } else if (x$p > x$i & x$i > x$j) {
        dist3 <- min.dist.calc(x, seqlength)
        A3 <- (2*(dist3) / seqlength)
        return(A3)
    } else if (x$p > x$j & x$j > x$i) {
        if ((x$j - x$i) < (seqlength / 2)) {
            B4 <- ((2*(x$p - x$i)) / seqlength)
            return(B4)
        } else { 
            C4 <- (seqlength - (2*(x$p - x$j)) / seqlength)
            return(C4)
    }} else if (x$i > x$p & x$p > x$j) {
        if ((((seqlength - x$i) + (x$j)) < (seqlength / 2))) {
            B5 <- ((2*((seqlength - x$i) + x$p)) / seqlength)
            return(B5)
        } else {
            C5 <- (seqlength - (2*(x$p - x$j)) / seqlength)
            return(C5)
    }} else if (x$j > x$i & x$i > x$p) {
        if (((x$j - x$i) < (seqlength / 2))) {
            B6 <- ((2*((seqlength - x$i) + x$p)) / seqlength)
            return(B6)
        } else {
            C6 <- (seqlength - (2*((seqlength - x$j) + x$p)) / seqlength)  
            return(C6)  
    }} else {
        return(NA)
    }
}

Ideally, I'd expect to be able to apply (something like) this function to the data table using ijp[, col6 := newfunc(.SD,seqlength), by=.(ID)] and receive an output with the sixth column calculated according to each given condition. However, as it's currently constructed, I'm getting this error:

Error in `[.data.table`(ijp, , `:=`(col6, newfunc(.SD, seqlength)),  : 
  Type of RHS ('double') must match LHS ('logical'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

Any insights on how I can improve newfunc to perform these calculations?

Aucun commentaire:

Enregistrer un commentaire