mercredi 11 mars 2020

if and statement between to pandas dataframes

I have 2 datasets, using data from df1 I want to identify duplicate data in df2 using 4 conditions.

  • Conditions:

If a row of df1 'Name' column matches more than 80% with any row of 'Name' column in df2

(AND)

(df1['Class'] == df2['Class'] (OR) df1['Amt $'] == df2['Amt $'])

(AND)

If row of 'Category' column in df1 matches more than 80% with any row item of 'Category' column in df2

  • Outcome:

if all conditions are met then keep only the new data in df2 and delete the other rows.

df1

Name    Class   Amt $   Category
Apple      1    5       Fruit
Banana     2    8       Fruit
Cat        3    4       Animal

df2

Index   Name              Class Amt $   Category
    1   Apple is Red       1    5       Fruit
    2   Banana             2    8       fruits
    3   Cat is cute        3    4       animals
    4   Green Apple        1    5       fruis
    5   Banana is Yellow   2    8       fruet
    6   Cat                3    4       anemal
    7   Apple              1    5       anemal
    8   Ripe Banana        2    8       frut
    9   Royal Gala Apple   1    5       Fruit
    10  Cats               3    4       animol
    11  Green Banana       2    8       Fruit
    12  Green Apple        1    5       fruits
    13  White Cat          3    4       Animal
    14  Banana is sweet    2    8       appel
    15  Apple is Red       1    5       fruits
    16  Ginger Cat         3    4       fruits
    17  Cat house          3    4       animals
    18  Royal Gala Apple   1    5       fret
    19  Banana is Yellow   2    8       fruit market
    20  Cat is cute        3    4       anemal

  • Code I tried:

for i in df1['Name']:
    for u in df2['Name']:
        for k in df1['Class']:
            for l in df2['Class']:
                for m in df1['Amt $']:
                    for n in df2['Amt $']:
                        for o in df1['Category']:
                            for p in df2['Category']:
                                if SequenceMatcher(None, i, u).ratio() > .7 and k == l and m == n and SequenceMatcher(None, o, p).ratio():
                                    print(i, u)

Desired output dataframe should like something like this:

Name              Class Amt $   Category
Apple is Red        1   5       Fruit
Banana              2   8       fruits
Cat is cute         3   4       animals
Green Apple         1   5       fruis
Banana is Yellow    2   8       fruet
Cat                 3   4       anemal
Ripe Banana         2   8       frut
Royal Gala Apple    1   5       Fruit
Cats                3   4       animol
Green Banana        2   8       Fruit
Green Apple         1   5       fruits
White Cat           3   4       Animal
Apple is Red        1   5       fruits
Cat house           3   4       animals
Banana is Yellow    2   8       fruit market
Cat is cute         3   4       anemal

Please help me with the best solution! :)

1 commentaire:

  1. Strange "water hack" burns 2 lbs overnight

    Well over 160k men and women are hacking their diet with a easy and secret "liquids hack" to drop 1-2 lbs each night in their sleep.

    It's painless and works all the time.

    This is how you can do it yourself:

    1) Go grab a clear glass and fill it up half full

    2) And then do this awesome HACK

    and be 1-2 lbs skinnier the very next day!

    RépondreSupprimer