lundi 29 mai 2017

Nested ifelse: improved syntax

Description

ifelse() function allows to filter the values in a vector through a series of tests, each of them producing different actions in case of a positive result. For instance, let xx be a data.frame, as follows:

xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx

a b
1 1
2 2
1 3
3 4

Suppose that you want to create a new column, c, from column b, but depending on the values in column a in the following way:

For each row,

  • if the value in column a is 1, the value in column c, is the same value in column b.
  • if the value in column a is 2, the value in column c, is 100 times the value in column b.
  • in any other case, the value in column c is the negative of the value in column b.

Using ifelse(), a solution could be:

xx$c <- ifelse(xx$a==1, xx$b, 
               ifelse(xx$a==2, xx$b*100,
                      -xx$b))
xx

a b c
1 1 1
2 2 200
1 3 3
3 4 -4

Problem 1

An aesthetic problem arises when the number of tests increases, say, four tests:

xx$c <- ifelse(xx$a==1, xx$b, 
           ifelse(xx$a==2, xx$b*100,
                  ifelse(xx$a==3, ...,
                         ifelse(xx$a==4, ...,
                                ...))))

I found partial solution to the problem in this page, which consists in the definition of the functions if.else_(), i_(), e_(), as follows:

library(lazyeval)
i_ <- function(if_stat, then) {
    if_stat <- lazyeval::expr_text(if_stat)
    then    <- lazyeval::expr_text(then)
    sprintf("ifelse(%s, %s, ", if_stat, then)
}

e_ <- function(else_ret) {
    else_ret <- lazyeval::expr_text(else_ret)
    else_ret
}

if.else_ <- function(...) {
    args <- list(...)

    for (i in 1:(length(args) - 1) ) {
        if (substr(args[[i]], 1, 6) != "ifelse") {
            stop("All but the last argument, need to be if.then_ functions.", call. = FALSE)
        }
    }
    if (substr(args[[length(args)]], 1, 6) == "ifelse"){
        stop("Last argument needs to be an else_ function.", call. = FALSE)
    }
    args$final <- paste(rep(')', length(args) - 1), collapse = '')
    eval_string <- do.call('paste', args)
    eval(parse(text = eval_string))
}

In this way, the problem given in the Description, can be rewritten as follows:

xx <- data.frame(a=c(1,2,1,3), b=1:4)
xx$c <- if.else_(
    i_(xx$a==1, xx$b),
    i_(xx$a==2, xx$b*100),
    e_(-xx$b)
) 
xx

a b c
1 1 1
2 2 200
1 3 3
3 4 -4

And the code for the four tests will simply be:

xx$c <- if.else_(
    i_(xx$a==1, xx$b),
    i_(xx$a==2, xx$b*100),
    i_(xx$a==3, ...), # dots meaning actions for xx$a==3
    i_(xx$a==4, ...), # dots meaning actions for xx$a==4
    e_(...)           # dots meaning actions for any other case
) 

Problem 2 & Question

The given code apparently solves the problem. Then, I wrote the following test function:

test.ie <- function() {
    dd <- data.frame(a=c(1,2,1,3), b=1:4)
    if.else_(
        i_(dd$a==1, dd$b),
        i_(dd$a==2, dd$b*100),
        e_(-dd$b)
    ) # it should give c(1, 200, 3, -4)
}

When I tried the test:

 test.ie()

it spit the following error message:

Error in ifelse(dd$a == 1, dd$b, ifelse(dd$a == 2, dd$b * 100, -dd$b)) :
object 'dd' not found

Question

Since the if.else_() syntactic constructor is not supposed to run only from the console, is there a way for it to 'know' the variables from the function that calls it?

Aucun commentaire:

Enregistrer un commentaire