I have a df with multiple rows. What I need is to check for a specific value in a column's value and return if there is a matching. I have a set of rules, which takes priority based on order.
My Sample df:
file_name fil_name
0 02qbhIPSYiHmV_sample_file-MR-job1 02qbhIPSYiHmV
1 02qbhIPSYiHmV_sample_file-MC-job2 02qbhIPSYiHmV
2 02qbhIPSYiHmV_sample_file-job3 02qbhIPSYiHmV
For me MC takes the first priority. If MC is present in file_name value, take that record. If MC is not there, then take the record that has MR in it. If no MC or MR, then just take what ever is there in my case just the third row.
I came up with a function like this,
def choose_best_record(df_t):
file_names = df_t['file_name']
for idx, fn in enumerate(file_names):
lw_fn = fn.lower()
if '-mc-' in lw_fn:
get_mc_row = df_t.iloc[idx:idx+1]
print("Returning MC row")
return get_mc_row
else:
if '-mr-' in lw_fn:
get_mr_row = df_t.iloc[idx:idx+1]
print('Returning MR row')
return get_mr_row
else:
normal_row = df_t.iloc[idx:idx+1]
print('Reutrning normal row')
return normal_row
However, this does not behave the way I want. I need MC (row index 1), instead, it returns MR row.
If I have my rows in the dataframe in order like this, ...file-MR-job1, ...file-MR-job1, ....file-MR-job1, then it works. How can I change my function to work based on how I need my out put?
Aucun commentaire:
Enregistrer un commentaire