mercredi 18 novembre 2020

Find mismatches in the same column considering variables in another column

Hi I have the following dataframe

df = rbind(c('John', '1', 'a', 'a'), 
            c('John', '1', 'a', 'a'), 
            c('David', '2', 'b', 'b'), 
            c('David', '2', 'b', 'b'),
            c('Jack', '3', 'b', 'b'),
            c('Jack', '3', 'b', 'b'),
           c('David', '1', 'b', 'b'),
            c('Chris', '3', 'b', 'b'),
            c('Peter', '4', 'b', 'b')) %>%
    data.frame

colnames(df) <- c('name', paste('t', 1:3, sep = ''))

   name t1 t2 t3
1  John  1  a  a
2  John  1  a  a
3 David  2  b  b
4 David  2  b  b
5  Jack  3  b  b
6  Jack  3  b  b
7 David  1  b  b
8 Chris  3  b  b
9 Peter  4  b  b

Here, column 't1' is supposed to be unique for each name - so 1 for John, 2 for David, 3 for Jack, and so on. So basically if one variable, say 1 in t1, corresponds to two different names, then it's wrong. So I want to find rows that have 2 different names for each number in column t1. In the dataframe, there are John and David for 1, and Jack and Chris for 3. So I want to retrieve rows for those people.

I want output like this:

  name t1 t2 t3
  John  1  a  a
 David  1  b  b
  Jack  3  b  b
 Chris  3  b  b

Can this be done within dplyr tidyverse or any basic codes? Does this require forloops? I am pretty new to R so looking for simple ways to achieve this.

Thanks in advance!

Aucun commentaire:

Enregistrer un commentaire