mercredi 29 mars 2017

Combining ifelse and any to create new column - R 3.3.2 Windows 7

I have a data.table and I'm trying to create a new column by checking to see if a row has particular values in any of a given set of columns.

head(d1)

   MEDREC_KEY   pat_key           drug1          drug2          drug3       drug4        drug5       drug6      drug7     drug8 drug9 drug10 drug11 drug12
1: -140665983 669723105 Anti-infectives Cephalosporins     Ethambutol   Isoniazid   Macrolides Penicillins Quinolones Rifamycin    NA     NA     NA     NA
2: -606290573  85924804 Anti-infectives   Beta-lactams Cephalosporins Penicillins   Quinolones          NA         NA        NA    NA     NA     NA     NA
3: -615873176 161009395  Cephalosporins    Penicillins             NA          NA           NA          NA         NA        NA    NA     NA     NA     NA
4: -616819481  36280536 Anti-infectives Cephalosporins     Macrolides  Quinolones           NA          NA         NA        NA    NA     NA     NA     NA
5: -625709819 720290063 Anti-infectives Cephalosporins     Ethambutol   Isoniazid Pyrazinamide  Quinolones  Rifamycin        NA    NA     NA     NA     NA
6: -637094857 720918635 Anti-infectives    Penicillins     Quinolones          NA           NA          NA         NA        NA    NA     NA     NA     NA

What I want to happen is if any of the "drug" columns == "Macrolides" AND any of the same columns == "Cephalosporins" then my new column, "correct" == 1 otherwise "correct" == 0 (or it could be logical), like so:

head(d1)
   MEDREC_KEY   pat_key           drug1          drug2          drug3       drug4        drug5       drug6      drug7     drug8 drug9 drug10 drug11 drug12 correct
1: -140665983 669723105 Anti-infectives Cephalosporins     Ethambutol   Isoniazid   Macrolides Penicillins Quinolones Rifamycin    NA     NA     NA     NA   1
2: -606290573  85924804 Anti-infectives   Beta-lactams Cephalosporins Penicillins   Quinolones          NA         NA        NA    NA     NA     NA     NA   0
3: -615873176 161009395  Cephalosporins    Penicillins             NA          NA           NA          NA         NA        NA    NA     NA     NA     NA   0
4: -616819481  36280536 Anti-infectives Cephalosporins     Macrolides  Quinolones           NA          NA         NA        NA    NA     NA     NA     NA   1
5: -625709819 720290063 Anti-infectives Cephalosporins     Ethambutol   Isoniazid Pyrazinamide  Quinolones  Rifamycin        NA    NA     NA     NA     NA   0
6: -637094857 720918635 Anti-infectives    Penicillins     Quinolones          NA           NA          NA         NA        NA    NA     NA     NA     NA   0

I've tried both of these (but am still learning how to decipher warning messages so those don't help much, especially as I am new to data.table):

> d1$correct<-ifelse(d1[,c(3:14)]=="Macrolides" | d1[,c(3:14)]=="Cephalosporins", 1, 0)
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
  12 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(x, j = name, value = value) :
  Supplied 56868 items to be assigned to 4739 items of column 'correct' (52129 unused)
> 
> 
> selected_cols<-c("drug1", "drug2", "drug3", "drug4", "drug5", "drug6", "drug7", "drug8", "drug9", "drug10", "drug11", "drug12")
> d1$correct<-ifelse(d1 %in% selected_cols=="Macrolides" | d1 %in% selected_cols=="Cephalosporins", 1, 0)
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
  Supplied 16 items to be assigned to 4739 items of column 'correct' (recycled leaving remainder of 3 items).

The closest I've gotten is this:

d1$correct<-apply(d1, 1, function(r) any(r %in% c("Macrolides", "Cephalosporins")))

Which will give TRUE if either of those is true across columns, but I can't figure out how to do it if both of those is true across columns. I'd prefer to not have to use a stunningly massive ifelse statement, since I have 12 columns and more combinations I'll need to make, and the NA's throw it off anyway.

I'd love a dplyr or data.table solution since those are so elegant, but at this point I'm desperate.

Aucun commentaire:

Enregistrer un commentaire