mardi 29 octobre 2019

how to display duplicate ID along with duplicate data in the python pandas

I have a data frame which looks below

k={'ID':[1,2,3,4,5,6],'Name':['John Danny','Micheal K','John Danny','jerred','John Danny','joe'],'phone':['1111',
                                                                                   '2222','2233','1111','2222','6666']}
df=pd.DataFrame(data=k)
df
    ID  Name       phone
    1   John Danny  1111
    2   Micheal K   2222
    3   John Danny  2233
    4   jerred      1111
    5   John Danny  2222

I need to find the duplicated in name and phone in the data frame so used the below-given code

df[df['Name'].duplicated(keep=False)].sort_values("Name")

duplicated based on name


ID  Name       phone
1   John Danny  1111
3   John Danny  2233
5   John Danny  2222

duplicated based on phone

    ID  Name       phone
    1   John Danny  1111
    4   jerred      1111
    2   Micheal K   2222
    5   John Danny  2222

but I want the result as follows

ID  Name      phone duplicated of name ids  duplicated of phone ids Duplicate_name  Duplicate_phone
1   John Danny  1111    3,5                    4                    Yes              Yes
2   Micheal K   2222                           5                     No              Yes
3   John Danny  2233    1,5                                          Yes              No
4   jerred      1111                           1                     No              Yes
5   John Danny  2222    1,3                    2                     Yes             Yes

I was able to find duplicate_name and duplicate_phone by using below code

df['Duplicate_name'] = df['Name'].duplicated(keep=False).map({True:'Yes', False:'No'})
df['Duplicate_phone'] = df['phone'].duplicated(keep=False).map({True:'Yes', False:'No'})

The problem is was not able dispaly ID in duplicated of phone ids and duplicated of name ids as above given result table how to do it?

Aucun commentaire:

Enregistrer un commentaire