Why does the loop think that only one group meets the condition (if NaN present in group values)?
There are several NaNs throughout the other groups, but it only returns the first group.
-
It appears to iterate over each group, but does not properly return the others that have NaN values.
-
Goal is to return the groups that have
nan
values...
DataFrame:
sample_data = [['USA', 'gdp', 2001, 10],['USA', 'avgIQ', 2001, 100],['USA', 'people', 2001, 1000],['USA', 'dragons', 2001, 3],['CHN', 'gdp', 2001, 12], ['CHN', 'avgIQ', 2001, 120],['CHN', 'people', 2001, 2000],['CHN', 'dragons', 2001, 1],['RUS', 'gdp', 2001, 11],['RUS', 'avgIQ', 2001, 105], ['RUS', 'people', 2001, 1500],['RUS', 'dragons', 2001, np.nan],['USA', 'gdp', 2002, 12],['USA', 'avgIQ', 2002, 105],['USA', 'people', 2002, 1200], ['USA', 'dragons', 2002, np.nan],['CHN', 'gdp', 2002, 14],['CHN', 'avgIQ', 2002, 127],['CHN', 'people', 2002, 3100],['CHN', 'dragons', 2002, 4], ['RUS', 'gdp', 2002, 11],['RUS', 'avgIQ', 2002, 99],['RUS', 'people', 2002, 1600],['RUS', 'dragons', 2002, np.nan],['USA', 'gdp', 2003, 15], ['USA', 'avgIQ', 2003, 115],['USA', 'people', 2003, 2000],['USA', 'dragons', 2003, np.nan],['CHN', 'gdp', 2003, 16],['CHN', 'avgIQ', 2003, 132], ['CHN', 'people', 2003, 4000],['CHN', 'dragons', 2003, 6],['RUS', 'gdp', 2003, 11],['RUS', 'avgIQ', 2003, 108],['RUS', 'people', 2003, 2000], ['RUS', 'dragons', 2003, np.nan],['USA', 'gdp', 2004, 18],['USA', 'avgIQ', 2004, 111],['USA', 'people', 2004, 2500],['USA', 'dragons', 2004, np.nan], ['CHN', 'gdp', 2004, 18],['CHN', 'avgIQ', 2004, 140],['CHN', 'people', 2004, np.nan],['CHN', 'dragons', 2004, np.nan], ['RUS', 'gdp', 2004, 15],['RUS', 'avgIQ', 2004, 103],['RUS', 'people', 2004, 2800],['RUS', 'dragons', 2004, np.nan], ['USA', 'gdp', 2005, 23],['USA', 'avgIQ', 2005, 111],['USA', 'people', 2005, 3700],['USA', 'dragons', 2005, 8],['CHN', 'gdp', 2005, 22], ['CHN', 'avgIQ', 2005, 143],['CHN', 'people', 2005, 6000],['CHN', 'dragons', 2005, 15],['RUS', 'gdp', 2005, 17],['RUS', 'avgIQ', 2005, np.nan], ['RUS', 'people', 2005, 3000],['RUS', 'dragons', 2005, np.nan]]
sample_df = pd.DataFrame(sample_data, columns = ['A','B','C','D'])
sample_df['C'] = sample_df['C'].astype(float)
sample_df.head()
Data columns (total 4 columns):
A 60 non-null object
B 60 non-null object
C 60 non-null float64
D 49 non-null float64
dtypes: float64(2), object(2)
The following Loop is the problem. It runs through all the groups, but only properly returns the first group to meet the criteria in the if-statement.
Note the hashtags I placed in the output.
sample_group = sample_df.groupby(['A', 'B'])
for group_index, group in sample_group:
if group.isnull().values.any() in group.values:
print(group)
else:
#continue
print('Checked group but could not satisfy condition', group_index)
Checked group but could not satisfy condition ('CHN', 'avgIQ')
A B C D
7 CHN dragons 2,001.00 1.00
19 CHN dragons 2,002.00 4.00
31 CHN dragons 2,003.00 6.00
43 CHN dragons 2,004.00 nan #prints the group because it does in fact have an nan value
55 CHN dragons 2,005.00 15.00
Checked group but could not satisfy condition ('CHN', 'gdp')
Checked group but could not satisfy condition ('CHN', 'people') #this has nan values
Checked group but could not satisfy condition ('RUS', 'avgIQ')
Checked group but could not satisfy condition ('RUS', 'dragons') #this has nan values
Checked group but could not satisfy condition ('RUS', 'gdp')
Checked group but could not satisfy condition ('RUS', 'people')
Checked group but could not satisfy condition ('USA', 'avgIQ') #this has nan values
Checked group but could not satisfy condition ('USA', 'dragons')
Checked group but could not satisfy condition ('USA', 'gdp')
Checked group but could not satisfy condition ('USA', 'people')
Whereas the following works just fine:
- in this case, the loop looks for groups that have a value of 12 somewhere in them, and there are only two groups that meet this criteria, so it works great.
for group_index, group in sample_group:
if 12 in group.values:
print(group)
else:
#continue
print('Checked group but could not satisfy condition', group_index)
Checked group but could not satisfy condition ('CHN', 'avgIQ')
Checked group but could not satisfy condition ('CHN', 'dragons')
A B C D
4 CHN gdp 2,001.00 12.00 #Has a 12
16 CHN gdp 2,002.00 14.00
28 CHN gdp 2,003.00 16.00
40 CHN gdp 2,004.00 18.00
52 CHN gdp 2,005.00 22.00
Checked group but could not satisfy condition ('CHN', 'people')
Checked group but could not satisfy condition ('RUS', 'avgIQ')
Checked group but could not satisfy condition ('RUS', 'dragons')
Checked group but could not satisfy condition ('RUS', 'gdp')
Checked group but could not satisfy condition ('RUS', 'people')
Checked group but could not satisfy condition ('USA', 'avgIQ')
Checked group but could not satisfy condition ('USA', 'dragons')
A B C D
0 USA gdp 2,001.00 10.00
12 USA gdp 2,002.00 12.00 #Has a 12
24 USA gdp 2,003.00 15.00
36 USA gdp 2,004.00 18.00
48 USA gdp 2,005.00 23.00
Checked group but could not satisfy condition ('USA', 'people')
The first loop clearly goes over each group, but only prints the first one that meets the if-statement criteria.
Aucun commentaire:
Enregistrer un commentaire