Dataframe :
| name | Location | Rating | Frequency |
|---|---|---|---|
| Ali | Nasi Kandar | 1 star | 1 |
| Ali | Baskin Robin | 4 star | 3 |
| Ali | Nasi Ayam | 3 star | 1 |
| Ali | Burgergrill | 2 star | 2 |
| Lee | Fries | 1 star | 3 |
| Abu | Mcdonald | 3 star | 3 |
| Abu | KFC | 3 star | 1 |
| Ahmad | Nandos | 3 star | 2 |
| Ahmad | Burgerdhil | 2 star | 3 |
| Ahmad | Kebab | 1 star | 10 |
Here is the sample data set. The logic would be:
1st condition: if the name has duplicate values, check the frequency and see which one is higher, drop the row with lower frequency
2nd condition: If no name duplicate (e.g:Lee), keep the row
3rd condition: Rating is the same (e.g: Abu), keep the first value
Desired Output:
| name | Location | Rating | Frequency |
|---|---|---|---|
| Ali | Baskin Robin | 4 star | 3 |
| Lee | Fries | 1 star | 3 |
| Abu | KFC | 3 star | 1 |
| Ahmad | Kebab | 1 star | 10 |
Any of you guys know how I do this in python pandas or pyspark?
I got into troubles checking for duplicates and also applying probably the "if condition" to this dataframe
Aucun commentaire:
Enregistrer un commentaire