I have a list of dataframes that I want to loop through and check if a particular string is located in id column. If so, get the index of that row and assign a new column and name every row below that index yes. If id column does not exist in that column, then skip that dataframe and assign another condition. My sample code is below:
sample_df_2 = pd.DataFrame(data={
'id': ['A', 'B', 'C','G','D','E'],
'n' : [ 1, 2, 3, 5, 5, 9],
'v' : [ 10, 13, 8, 8, 4 , 3],
'z' : [5, 3, 6, 9, 9, 8]
})
sample_df_1 = pd.DataFrame(data={
'id': ['L', 'K', 'C','G','D','E'],
'n' : [ 1, 2, 3, 5, 5, 9],
'v' : [ 10, 13, 8, 8, 4 , 3],
'z' : [5, 3, 6, 9, 9, 8]
})
def assign_new_column(data_frame):
if data_frame['id'].str.contains('A','C').any():
index_A=data_frame[data_frame['id']=='A'].index.tolist()
index_C=data_frame[data_frame['id']=='C'].index.tolist()
data_frame['Yes']=np.select([data_frame.index<=index_A],['Yeaaap'])
data_frame['No']=np.select([data_frame.index<=index_C],['Yeaaap'])
else:
index_G=data_frame[data_frame['id']=='G'].index.tolist()
index_D=data_frame[data_frame['id']=='D'].index.tolist()
data_frame['Yes']=np.select([data_frame.index<=index_G],['Noop'])
data_frame['No']=np.select([data_frame.index<=index_D],['supp'])
return data_frame
index=sample_df[sample_df['id']=='A'].index.tolist()
df=[]
for i in range(len(df_list)):
df.append(assign_new_column(df_list[i]))
pd.concat(df)
This is giving me the following output.
id n v z Yes No
0 A 1 10 5 Yeaaap Yeaaap
1 B 2 13 3 0 Yeaaap
2 C 3 8 6 0 Yeaaap
3 G 5 8 9 0 0
4 D 5 4 9 0 0
5 E 9 3 8 0 0
0 L 1 10 5 Noop supp
1 K 2 13 3 Noop supp
2 C 3 8 6 Noop supp
3 G 5 8 9 Noop supp
4 D 5 4 9 0 supp
5 E 9 3 8 0 0
But this is not correct as it is sort of overwriting the strings.
Can anyone help me how to solve this in an efficient manner?
Aucun commentaire:
Enregistrer un commentaire