jeudi 28 décembre 2017

Generate column in Rstudio if other columns are equal

I have two dataframes:

df1 <- data.frame('ID'=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), 
              'invoice'=c(24000, 25000, 26000, 27000, 28000, 29000, 30000, 31000, 32000, 33000),
              'settle'=c(40000, 41000, 42000, 43000, 44000, 45000, 46000, 47000, 48000, 49000), 
              'amount'=c(10, 20, 30, 10, 20, 30, 10, 20, 30, 10), 
              'reason'=c(4, 5, 9, 4, 5, 9, 4, 5, 15, 8))

And:

df2 <- data.frame('ID'=c(1, 2, 4, 5, 7, 8, 11, 12), 
              'invoice'=c(40000, 41000, 43000, 44000, 46000, 47000, 40000, 41000),
              'settle'=c(24000, 25000, 27000, 28000, 30000, 31000, 24000, 25000),
              'amount'=c(10, 20, 10, 20, 10, 20, 10, 10), 
              'reason'=c(4, 5, 4, 5, 4, 5, 4, 4))

df1:

   ID invoice settle amount reason
   1   24000  40000     10      4
   2   25000  41000     20      5
   3   26000  42000     30      9
   4   27000  43000     10      4
   5   28000  44000     20      5
   6   29000  45000     30      9
   7   30000  46000     10      4
   8   31000  47000     20      5
   9   32000  48000     30     15
  10   33000  49000     10      8

df2:

ID invoice settle amount reason
 1   40000  24000     10      4
 2   41000  25000     20      5
 4   43000  27000     10      4
 5   44000  28000     20      5
 7   46000  30000     10      4
 8   47000  31000     20      5 
11   40000  24000     10      4
12   41000  25000     10      4

So I'd like to generate a dummy variable in df1 from the following conditions:

if df1$ID == df2$ID
if df1$settle == df2$invoice
if df1$amount == df2$amount
if df1$reason == df2$reason

So if the conditions are met, my new column should be equal 1, else 0.

I've tried:

 df1$newvar <- ifelse(df1$ID == df2$ID & 
                      df1$settle == df2$invoice &
                      df1$amount == df2$amount &
                      df1$reason == df2$reason, 1, 0)

I get the warning message:

 "longer object length is not a multiple of shorter object length"

So I gues an ifelse isn't possible since my two dataframes aren't of the same size (more ID's in df1 than in df2).

Can you help me solve this problem?

In SPSS pr Stata I'd just use the IF-command, but R is pretty new to me!

Thanks! :)

Aucun commentaire:

Enregistrer un commentaire