jeudi 4 juin 2020

Conditionally concatenate variable names into a new variable in Python

I have a data set with 3 columns and occasional NAs. I am trying to create a new string column called 'check' that will concatenate the name of the variables that don't have an NA in each row in between underscores ('_'). I pasted my code below as well as the data that I have, the data that I need and what I actually get (See the hyperlinks after the code). For some reason, it seems the conditional that I have in place is completely ignored and the example_set['check'] = example_set['check'] + column is executed at every loop with or without the conditional code block. I assume there is a Python/Pandas quirk that I haven't fully comprehended... Can you please help?

example_set = pd.DataFrame({

                        'A':[3,4,np.nan]
                        ,'B':[1,np.nan,np.nan]
                        ,'C':[3,4,5]

                            }
                          ) 
example_set

columns = list(example_set.columns)

example_set['check'] = '_'

for column in columns:
    for row in range(example_set.shape[0]):
        if example_set[column][row] != np.nan:
            example_set['check'] =  example_set['check'] + column
        else:
            continue

example_set

Data that I have

Data that I was hoping to get

What I actually get

Aucun commentaire:

Enregistrer un commentaire