vendredi 15 janvier 2021

How to check for duplicate values in the same dataframe column and apply if condition by dropping the row based on frequency?

Dataframe :

name Location Rating Frequency
Ali Nasi Kandar 1 star 1
Ali Baskin Robin 4 star 3
Ali Nasi Ayam 3 star 1
Ali Burgergrill 2 star 2
Lee Fries 1 star 3
Abu Mcdonald 3 star 3
Abu KFC 3 star 1
Ahmad Nandos 3 star 2
Ahmad Burgerdhil 2 star 3
Ahmad Kebab 1 star 10

Here is the sample data set. The logic would be:

1st condition: if the name has duplicate values, check the frequency and see which one is higher, drop the row with lower frequency

2nd condition: If no name duplicate (e.g:Lee), keep the row

3rd condition: Rating is the same (e.g: Abu), keep the first value

Desired Output:

name Location Rating Frequency
Ali Baskin Robin 4 star 3
Lee Fries 1 star 3
Abu KFC 3 star 1
Ahmad Kebab 1 star 10

Any of you guys know how I do this in python pandas or pyspark?

I got into troubles checking for duplicates and also applying probably the "if condition" to this dataframe

Aucun commentaire:

Enregistrer un commentaire