I have some data in a large data frame (about 80x300) that looks something like this:
dum <- data.frame(id=c("a", "b", "c", "d", "e"),
v1=c(2, 7, 8, 5, 0),
v2=c(9, 2, 4, 6, 1),
v3=c(2, 2, 6, 1, 7))
I would like to turn each variable into a dichotomous variable indicating whether or not each particular observation is in the top 20% of each variable. {I'll then later merge the dummy dataset and the raw data set later, which is unimportant for now but if anyone wants to get creative that's the full plan.} Now the output dataframe should look something like this:
id v1 v2 v3
a 0 1 0
b 0 0 0
c 1 0 0
d 0 0 0
e 0 0 1
My attempt at this looks like the following:
top <- 20 # set percentage
for(i in 2:ncol(dum)) {
for(j in 1:nrow(dum)) {
ifelse(dum[j,i]>=unname(quantile(dum[,i],probs=((100-top)/100))), dum[j,i]<-1, dum[j,i]<-0)
}
}
However, when I run this command I end up getting more ones than desired in the output dataset in some cases and exactly the number I want in other cases. Instead of looking like what I said it should look like above, it looks like this:
id v1 v2 v3
a 0 1 0
b 0 0 0
c 1 0 0
d 1 1 0
e 0 1 1
Can anyone help identify where I am going wrong? A few notes: 1) I am prepared to get hated on for using loops, especially nested loops, but it's something I'm familiar with and computational time is not a concern here. 2) Based on my googling it seems using the apply family of functions could be useful but I'm not very familiar with them so I wouldn't know where to begin. 3) I included the unname()
command as an attempted fix but it runs the same with or without it. 4) The YES/NO part of the ifelse()
statement looks funny to me but when I tried to do ifelse(cond, 1, 0)
it didn't make any changes to the data frame, and i didn't understand why.
Thanks!
Aucun commentaire:
Enregistrer un commentaire