jeudi 22 mars 2018

combine if condition with isnan statement

I am trying to work with an if statement and check if the row values are NaN or not. It turns out to be more difficult that I thought

here is an example:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': ['A1', 'B1', 'NaN', 'A1', 'B1','NaN']},
                   columns = ['key', 'data1', 'data2'])

def set_perf(row):
    if ("C" in row['key']) & (row['data2']=="NaN"):
        return row['data1']    
    else:        
        return 1

df['NewColumn'] = df.apply(set_perf, axis=1)  

the output is

  key  data1 data2  NewColumn
0   A      0    A1          1
1   B      1    B1          1
2   C      2   NaN          2
3   A      3    A1          1
4   B      4    B1          1
5   C      5   NaN          5

The output gives me what I am looking for meaning that I am able to identify the NaN value by adding another condition in the if statement (row['data2']=="NaN")

I have applied exactly the same logic in my original dataset but it didnt work. Here is a snapshot

      NewPerfColumn sec_type tran_type LDI Bucket  Alpha vs Markit
0             1.000     GOVT        BB        NaN      3283.400526
1             1.000     GOVT        BB        NaN      6710.130364
2             1.000     GOVT        BB        NaN      3266.912122
3             1.000     GOVT        BB        NaN    113401.946471
4             1.000     GOVT        BB        NaN      1938.494818
5             1.000     GOVT        BB        NaN      9505.724498
6             1.000     GOVT        BB        NaN       192.196620
7             1.000  MUNITAX       RRP        NaN    -97968.750000

when I add (row['LDI Bucket']=="NaN" ) in the if condition the value "NaN" is not recognizable. here are the distinct values of column "LDI Bucket"

data['LDI Bucket'].unique()
array([nan, u'0-3m', u'3-6m', u'6-9m', u'9m-1y'], dtype=object)

Have I missed anything?

Aucun commentaire:

Enregistrer un commentaire