I'm trying to obtain a vector of factors X whose values depends on two (maybe more) columns in a data frame. So it can has more than two levels.
There is an easy way to perform it using C/C++-like conditional statements in a for loop. Let's say, If I'm constructing X from values in two boolean columns Col1 and Col2 in a dataframe MATRIX, I can do it easily as:
X=vector()
for ( i in 1:nrow(MATRIX)) {
if (MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1) {
X[i] = "both"
} else if (MATRIX$Col1[i]==1) {
X[i] = "col1"
} else if (MATRIX$Col2[i]==1) {
X[i] = "col2"
} else {
X[i] = "none"
}
}
The problem is, obviosly, that in large dataframes it takes many time running. I should use vectorization in order to optimize this, but I cannot see the way, since functions as *apply, ifelse or any does not seem help is such a task, where the result is not boolean.
Any ideas?
Aucun commentaire:
Enregistrer un commentaire