vendredi 10 août 2018

Pandas assign categories to numbers in dataframe

I have this dataframe:

+----------+----------+------------+------------+
| values_A | values_B | values_A_r | values_B_r |
+----------+----------+------------+------------+
|    1.623 |  1.91232 |        1.6 |     1.9123 |
|    1.582 |  1.32154 |        1.6 |     1.3215 |
+----------+----------+------------+------------+

I want to find the difference between values_A (rounded to 1 digit) and values_A_r, as well as the difference between values_B (rounded to 4 digit) and values_B_r. Assign the categories "Same", "More", or "Less" depending on the difference. If all the differences are "Same", print out a message. This is my current code.

def test_func():                                                         

    check['A_check'] = np.where(abs(round(check.values_A,1) - check.values_A_r)<1**-10, 'Same',
                                 np.where(round(check.values_A,1) > check.values_A_r, 'Less', 'More'))
    check['B_check'] = np.where(abs(round(check.values_B,4) - check.values_B_r)<1**-10, 'Same',
                                 np.where(round(check.values_B,4) > check.values_B_r, 'Less', 'More'))
    if (set([len(check.index)])==set([check.A_check.value_counts().Same,
                                     check.B_check.value_counts().Same])):
        return print('Correct!')
    else:
        raise SystemExit('Incorrect.')
test_func()

I'm currently using nested np where statements to assign the categories and it looks really messy. Additionally, I have to use <1**-10 instead of ==0 because sometimes when the numbers are the "Same", the difference is calculated to be something like 8**-12 instead of just 0.

This process seems really simple and the code makes it look complicated. Is there a neater way to do this?

Aucun commentaire:

Enregistrer un commentaire