i want to apply a function to every row, after ordering a subset of that row according dependent on an if/else statement. after which populating a new column in the dataframe with the results. i have over two million rows, so a for loop to do this is very inefficient.
given the following dataframe: (edited for r2evans)
df<-as.data.frame(cbind(matrix(LETTERS[1:3], ncol=1),matrix(sample.int(100,size=15),nrow=3,ncol=5))
> df
V1 V2 V3 V4 V5 V6
1 A 77 79 32 5 4
2 B 57 24 68 65 45
3 C 66 60 82 74 22
is there a way to apply the following loop to each row without a for loop as i actually have over 2630800 rows?
df$num <- 0
df[2:7] <- sapply(df[2:7],as.numeric)
names(df) <- c("first_name", "sec", "A", "B", "C", "D", "num")
the names of the columns are required for the if statement below: (also edited to only sort once)
for (i in seq_len(nrow(df))) {
row = sort(df[i,3:6])
if (df[i,1]==names(row)[4]) {
df$num[i] = row[3]/(row[3]+row[4])
} else {
df$num[i] = row[4]/(row[3]+row[4])
}
}
such that i get this outcome:
> df
first_name sec A B C D num
1 A 77 79 32 5 4 0.2882883
2 B 57 24 68 65 45 0.4887218
3 C 66 60 82 74 22 0.525641
i'm not sure how to do this with apply, was thinking something like this? although this does not work and i am not sure how to incorporate the if/else conditions:
df$num <- apply(df, 1, function(x) unlist(x[3:6][order(x[3:6])][3]/(x[3:6][order(x[3:6])][3]+x[3:6][order(x[3:6])][4])))
Aucun commentaire:
Enregistrer un commentaire