I am trying to work with an if statement and check if the row values are NaN or not. It turns out to be more difficult that I thought
here is an example:
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': ['A1', 'B1', 'NaN', 'A1', 'B1','NaN']},
columns = ['key', 'data1', 'data2'])
def set_perf(row):
if ("C" in row['key']) & (row['data2']=="NaN"):
return row['data1']
else:
return 1
df['NewColumn'] = df.apply(set_perf, axis=1)
the output is
key data1 data2 NewColumn
0 A 0 A1 1
1 B 1 B1 1
2 C 2 NaN 2
3 A 3 A1 1
4 B 4 B1 1
5 C 5 NaN 5
The output gives me what I am looking for meaning that I am able to identify the NaN value by adding another condition in the if statement (row['data2']=="NaN")
I have applied exactly the same logic in my original dataset but it didnt work. Here is a snapshot
NewPerfColumn sec_type tran_type LDI Bucket Alpha vs Markit
0 1.000 GOVT BB NaN 3283.400526
1 1.000 GOVT BB NaN 6710.130364
2 1.000 GOVT BB NaN 3266.912122
3 1.000 GOVT BB NaN 113401.946471
4 1.000 GOVT BB NaN 1938.494818
5 1.000 GOVT BB NaN 9505.724498
6 1.000 GOVT BB NaN 192.196620
7 1.000 MUNITAX RRP NaN -97968.750000
when I add (row['LDI Bucket']=="NaN" ) in the if condition the value "NaN" is not recognizable. here are the distinct values of column "LDI Bucket"
data['LDI Bucket'].unique()
array([nan, u'0-3m', u'3-6m', u'6-9m', u'9m-1y'], dtype=object)
Have I missed anything?
Aucun commentaire:
Enregistrer un commentaire