mercredi 11 décembre 2019

Python Loop Only Returns First Group That Meets Condition (Data, code, picture, and examples provided)

Why does the loop think that only one group meets the condition (if NaN present in group values)?

There are several NaNs throughout the other groups, but it only returns the first group.

  • It appears to iterate over each group, but does not properly return the others that have NaN values.

  • Goal is to return the groups that have nan values...

DataFrame:

sample_data = [['USA', 'gdp', 2001, 10],['USA', 'avgIQ', 2001, 100],['USA', 'people', 2001, 1000],['USA', 'dragons', 2001, 3],['CHN', 'gdp', 2001, 12], ['CHN', 'avgIQ', 2001, 120],['CHN', 'people', 2001, 2000],['CHN', 'dragons', 2001, 1],['RUS', 'gdp', 2001, 11],['RUS', 'avgIQ', 2001, 105], ['RUS', 'people', 2001, 1500],['RUS', 'dragons', 2001, np.nan],['USA', 'gdp', 2002, 12],['USA', 'avgIQ', 2002, 105],['USA', 'people', 2002, 1200], ['USA', 'dragons', 2002, np.nan],['CHN', 'gdp', 2002, 14],['CHN', 'avgIQ', 2002, 127],['CHN', 'people', 2002, 3100],['CHN', 'dragons', 2002, 4], ['RUS', 'gdp', 2002, 11],['RUS', 'avgIQ', 2002, 99],['RUS', 'people', 2002, 1600],['RUS', 'dragons', 2002, np.nan],['USA', 'gdp', 2003, 15], ['USA', 'avgIQ', 2003, 115],['USA', 'people', 2003, 2000],['USA', 'dragons', 2003, np.nan],['CHN', 'gdp', 2003, 16],['CHN', 'avgIQ', 2003, 132], ['CHN', 'people', 2003, 4000],['CHN', 'dragons', 2003, 6],['RUS', 'gdp', 2003, 11],['RUS', 'avgIQ', 2003, 108],['RUS', 'people', 2003, 2000], ['RUS', 'dragons', 2003, np.nan],['USA', 'gdp', 2004, 18],['USA', 'avgIQ', 2004, 111],['USA', 'people', 2004, 2500],['USA', 'dragons', 2004, np.nan], ['CHN', 'gdp', 2004, 18],['CHN', 'avgIQ', 2004, 140],['CHN', 'people', 2004, np.nan],['CHN', 'dragons', 2004, np.nan], ['RUS', 'gdp', 2004, 15],['RUS', 'avgIQ', 2004, 103],['RUS', 'people', 2004, 2800],['RUS', 'dragons', 2004, np.nan], ['USA', 'gdp', 2005, 23],['USA', 'avgIQ', 2005, 111],['USA', 'people', 2005, 3700],['USA', 'dragons', 2005, 8],['CHN', 'gdp', 2005, 22], ['CHN', 'avgIQ', 2005, 143],['CHN', 'people', 2005, 6000],['CHN', 'dragons', 2005, 15],['RUS', 'gdp', 2005, 17],['RUS', 'avgIQ', 2005, np.nan], ['RUS', 'people', 2005, 3000],['RUS', 'dragons', 2005, np.nan]]

sample_df = pd.DataFrame(sample_data, columns = ['A','B','C','D'])

sample_df['C'] = sample_df['C'].astype(float) 
sample_df.head()

enter image description here

Data columns (total 4 columns):
A    60 non-null object
B    60 non-null object
C    60 non-null float64
D    49 non-null float64
dtypes: float64(2), object(2)

The following Loop is the problem. It runs through all the groups, but only properly returns the first group to meet the criteria in the if-statement.

Note the hashtags I placed in the output.

sample_group = sample_df.groupby(['A', 'B'])

for group_index, group in sample_group:

    if group.isnull().values.any() in group.values:
        print(group)

    else:
        #continue
        print('Checked group but could not satisfy condition', group_index)
Checked group but could not satisfy condition ('CHN', 'avgIQ')
      A        B        C     D
7   CHN  dragons 2,001.00  1.00
19  CHN  dragons 2,002.00  4.00
31  CHN  dragons 2,003.00  6.00
43  CHN  dragons 2,004.00   nan   #prints the group because it does in fact have an nan value
55  CHN  dragons 2,005.00 15.00
Checked group but could not satisfy condition ('CHN', 'gdp')
Checked group but could not satisfy condition ('CHN', 'people')   #this has nan values
Checked group but could not satisfy condition ('RUS', 'avgIQ')
Checked group but could not satisfy condition ('RUS', 'dragons')  #this has nan values
Checked group but could not satisfy condition ('RUS', 'gdp')
Checked group but could not satisfy condition ('RUS', 'people')  
Checked group but could not satisfy condition ('USA', 'avgIQ')    #this has nan values
Checked group but could not satisfy condition ('USA', 'dragons')
Checked group but could not satisfy condition ('USA', 'gdp')
Checked group but could not satisfy condition ('USA', 'people')

Whereas the following works just fine:

  • in this case, the loop looks for groups that have a value of 12 somewhere in them, and there are only two groups that meet this criteria, so it works great.
for group_index, group in sample_group:

    if 12 in group.values:
        print(group)

    else:
        #continue
        print('Checked group but could not satisfy condition', group_index)
Checked group but could not satisfy condition ('CHN', 'avgIQ')
Checked group but could not satisfy condition ('CHN', 'dragons')
      A    B        C     D
4   CHN  gdp 2,001.00 12.00   #Has a 12
16  CHN  gdp 2,002.00 14.00
28  CHN  gdp 2,003.00 16.00
40  CHN  gdp 2,004.00 18.00
52  CHN  gdp 2,005.00 22.00
Checked group but could not satisfy condition ('CHN', 'people')
Checked group but could not satisfy condition ('RUS', 'avgIQ')
Checked group but could not satisfy condition ('RUS', 'dragons')
Checked group but could not satisfy condition ('RUS', 'gdp')
Checked group but could not satisfy condition ('RUS', 'people')
Checked group but could not satisfy condition ('USA', 'avgIQ')
Checked group but could not satisfy condition ('USA', 'dragons')
      A    B        C     D
0   USA  gdp 2,001.00 10.00
12  USA  gdp 2,002.00 12.00   #Has a 12
24  USA  gdp 2,003.00 15.00
36  USA  gdp 2,004.00 18.00
48  USA  gdp 2,005.00 23.00
Checked group but could not satisfy condition ('USA', 'people')

The first loop clearly goes over each group, but only prints the first one that meets the if-statement criteria.

Aucun commentaire:

Enregistrer un commentaire