I have a data frame which looks below
k={'ID':[1,2,3,4,5,6],'Name':['John Danny','Micheal K','John Danny','jerred','John Danny','joe'],'phone':['1111',
'2222','2233','1111','2222','6666']}
df=pd.DataFrame(data=k)
df
ID Name phone
1 John Danny 1111
2 Micheal K 2222
3 John Danny 2233
4 jerred 1111
5 John Danny 2222
I need to find the duplicated in name and phone in the data frame so used the below-given code
df[df['Name'].duplicated(keep=False)].sort_values("Name")
duplicated based on name
ID Name phone
1 John Danny 1111
3 John Danny 2233
5 John Danny 2222
duplicated based on phone
ID Name phone
1 John Danny 1111
4 jerred 1111
2 Micheal K 2222
5 John Danny 2222
but I want the result as follows
ID Name phone duplicated of name ids duplicated of phone ids Duplicate_name Duplicate_phone
1 John Danny 1111 3,5 4 Yes Yes
2 Micheal K 2222 5 No Yes
3 John Danny 2233 1,5 Yes No
4 jerred 1111 1 No Yes
5 John Danny 2222 1,3 2 Yes Yes
I was able to find duplicate_name and duplicate_phone by using below code
df['Duplicate_name'] = df['Name'].duplicated(keep=False).map({True:'Yes', False:'No'})
df['Duplicate_phone'] = df['phone'].duplicated(keep=False).map({True:'Yes', False:'No'})
Aucun commentaire:
Enregistrer un commentaire