I am cleaning a scraped dataset from duplicates. I want to create a dummy variable indicating whether I have two or more observations that are identical in all conditions or all conditions but one.
Here's an example of my dataset:
Postcode | nrooms | price | sqm |
---|---|---|---|
76 | 1 | 259 | 30 |
75 | 5 | 380 | 120 |
75 | 5 | 400 | 120 |
75 | 2 | 450 | 80 |
76 | 1 | 259 | 30 |
Here's the dummy I want:
Postcode | nrooms | price | sqm | dummy |
---|---|---|---|---|
76 | 1 | 259 | 30 | 1 |
75 | 5 | 380 | 120 | 1 |
75 | 5 | 400 | 120 | 1 |
75 | 2 | 450 | 80 | 0 |
76 | 1 | 259 | 30 | 1 |
Where first and last rows have same values over all characteristics, the second and the third have same values in all characteristics but one (the price).
Could someone help me with this?
Thanks!
Aucun commentaire:
Enregistrer un commentaire