samedi 5 septembre 2020

Python: inconsistent handling of IF statment in loop

I have a dataframe df containing conditions and values.

import pandas as pd
df=pd.DataFrame({'COND':['X','X','X','Y','Y','Y'], 'VALUE':[1,2,3,1,2,3]})

Therefore df looks like:

  COND  VALUE
     X      1
     X      2
     X      3
     Y      1
     Y      2
     Y      3

I'm using a loop to subset df according to COND, and write separate text files containing values for each condition

for condition in conditions:
    df2=df[df['COND'].isin([condition])][['VALUE']]
    df2.to_csv(condition + '_values.txt',header=False, index=False)

The end results is two text files: X_vals.txt and Y_vals.txt, both of which contain 1 2 3. Up until this point everything is working as expected.

I would like to further subset df for one condition only. For example, perhaps I want all values from condition Y, but ONLY values < 3 from condition X. In this scenario, X_vals.txt should contain 1 2 and Y_vals.txt should contain 1 2 3. I tried implementing this with an IF statement:

for condition in conditions:
    if condition=='X':
        df=df[df['VALUE'] < 3]

    df2=df[df['COND'].isin([condition])][['VALUE']]
    df2.to_csv(condition + '_values.txt',header=False, index=False)

Here is where the inconsistency occurs. The above code works fine (i.e. X_vals.txt contains 1 2, and Y_vals.txt 1 2 3, as intended), but when I use if condition=='Y' instead of if condition=='X', it breaks, and both text files only contain 1 2.

In other words, if I specify the first element of conditions in the IF statement then it works as intended, however if I specify the second element then it breaks and applies the < 3 subset to values from both conditions.

What is going on here and how can I resolve it?

Thanks!

Aucun commentaire:

Enregistrer un commentaire