I have a data.table
and I'm trying to create a new column by checking to see if a row has particular values in any of a given set of columns.
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA
What I want to happen is if any of the "drug" columns == "Macrolides" AND any of the same columns == "Cephalosporins" then my new column, "correct" == 1 otherwise "correct" == 0 (or it could be logical), like so:
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12 correct
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA 1
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA 0
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA 0
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA 1
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA 0
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA 0
I've tried both of these (but am still learning how to decipher warning messages so those don't help much, especially as I am new to data.table):
> d1$correct<-ifelse(d1[,c(3:14)]=="Macrolides" | d1[,c(3:14)]=="Cephalosporins", 1, 0)
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
12 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(x, j = name, value = value) :
Supplied 56868 items to be assigned to 4739 items of column 'correct' (52129 unused)
>
>
> selected_cols<-c("drug1", "drug2", "drug3", "drug4", "drug5", "drug6", "drug7", "drug8", "drug9", "drug10", "drug11", "drug12")
> d1$correct<-ifelse(d1 %in% selected_cols=="Macrolides" | d1 %in% selected_cols=="Cephalosporins", 1, 0)
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
Supplied 16 items to be assigned to 4739 items of column 'correct' (recycled leaving remainder of 3 items).
The closest I've gotten is this:
d1$correct<-apply(d1, 1, function(r) any(r %in% c("Macrolides", "Cephalosporins")))
Which will give TRUE
if either of those is true across columns, but I can't figure out how to do it if both of those is true across columns. I'd prefer to not have to use a stunningly massive ifelse statement, since I have 12 columns and more combinations I'll need to make, and the NA's throw it off anyway.
I'd love a dplyr or data.table solution since those are so elegant, but at this point I'm desperate.
Aucun commentaire:
Enregistrer un commentaire