lundi 14 décembre 2015

subset of data frame on based on multiple conditions

I'm actually having a trouble with a particular task of my code. I have a data frame as

n  <- 6
set.seed(123)
df <- data.frame(x=paste0("x",seq_along(1:n)), A=sample(c(-2:2),n,replace=TRUE), B=sample(c(-1:3),n,replace=TRUE))
#
#    x  A B
# 1 x1 -1 1
# 2 x2  1 3
# 3 x3  0 1
# 4 x4  2 1
# 5 x5  2 3
# 6 x6 -2 1

and a decision tree as

A>0;Y;Y;N;N
B>1;Y;N;Y;N
C;1;2;2;1

that I load by

dt <- read.csv2("tmp.csv", header=FALSE)

I'd like to create a loop for all the possible combinations of (A>0) and (B>1) and set the C value to the subset x column that satisfy that condition. So, here's what I did

nr <- 3
nc <- 5

cond <- dt[1:(nr-1),1,drop=FALSE]
rule <- dt[nr,1,drop=FALSE]

subdf <- vector(mode="list",2^(nr-1))

for (i in 2:nc) {
  check <- paste0("")
  for (j in 1:(nr-1)) {
    case <- paste0(dt[j,1])
    if (dt[j,i]=="N")
      case <- paste0("!",case)
    check <- paste0(check, "(", case, ")" )

    if (j<(nr-1))
      check <- paste0(check, "&")

  }

  subdf[i]   <- subset(df,check)
  subdf[i]$C <- dt[nr,i]

}
unlist(subdf)

unfortunately, I got an error using subset as by this, it cannot parse the conditions from my string statements. what should I do?

Aucun commentaire:

Enregistrer un commentaire