lundi 4 juin 2018

Find duplicates within subgroups with conditions

I am having trouble with counting instances where a specific value appears within a subgroup of a group.

An example of the data are below:

GROUP   ID  DATA  
1       1   Main St    
1       1   1st St
1       2   Main St
1       2   
2       1   Main St
2       1   Main St
2       1   
2       2   1st St
2       2   

Context: I am resolving duplicates within my data, there are instances where the same person has two separate IDs and records. I ran an automated check to ID possible duplicates, which are tagged by the GROUP field; the ID field are the individual ID numbers. Address information is a very easy way to pick out duplicates.

I need to write something which does the following:

For each GROUP number, look at all the unique IDs within. For each unique ID within a group, look to see if the address in the DATA field appears for any other of the unique IDs within the group, but not the ID number in that specific row. Any matches should be flagged 1 / TRUE / etc.

I have written a COUNTIFS formula that gets most of the way there, but it does not exclude the row of the formula.

It is COUNTIFS(A:A,A2, B:B,"<>&", C:C,C2)

It should, for the first row, produce a 1, b/c Main St appears in Group1 ID1, and also in Group1 ID2. I cannot get it to ignore the Group1 ID1 row, I am stuck with the formula giving me a 2 for a result, b/c Main Street appears twice in Group1.

Any help is appreciated.

Thank you!

Aucun commentaire:

Enregistrer un commentaire