jeudi 15 février 2018

Compare multiple column of dataframe with condition R

I have a dataframe (containing 200k Rows):

DF1>

ID   SR1    SR2      DRC1   DX2
1    123    as#12.c  ABC-1  SXI
2    124    ae&14.v  ABC-1  SXI
3    125    at$19.e  AXX-1    
4    125    at$19.e  AXX-1  SCV
5    785    ab&22.n  AWZ-2  DDF
6    849    ab&22.n  AWZ-5  DDF

For this, I want to add new column to DF1 as status based on below conditions all together:

  1. Check Every DX2 Value we have same value in DRC-1 (i.e For ID 1 and 2 we have same DRC1 value as ABC-1).
  2. For some cases i don't have DX2 Value, for those check SR-1 and SR-2 to compare DRC-1 value throughout the dataframe, if its same show True in Status else False.

Note: if any value either SR-1 or SR-2 Match with any row in the entire dataframe, (i.e. Row No. 4 in desired output)

  1. Where we don't have DX2 Value but when compare through dateframe using SR-1 and SR-2, and found some where we have DX2 value corresponding to SR-1 and SR-2 than give Status as True-ID or False-ID based on condition.

Desired Output:

ID   SR1    SR2      DRC1   DX2  Status
1    123    as#12.c  ABC-1  SXI  True
2    124    ae&14.v  ABC-1  SXI  True
3    125    at$19.e  AXX-1       True-4
4    125    at$19.d  AXX-1  SCV  True
5    785    ab&22.n  AWZ-2  DDF  False
6    849    ab&22.n  AWZ-5  DDF  False

So far i could compare only one column with below code:

New_DF<-transform(DF_1, Status = ave(as.character(DF_1$DRC1), DF_1$DX2, FUN = function(x) 
  if(length(unique(x)) == 1) "True" else "False" ))

In addition, Just wondering if the same can be done in MySQL.??

Aucun commentaire:

Enregistrer un commentaire