I have a df with multiple rows. What I need is to check for a specific value in a column's value and return if there is a matching. I have a set of rules, which takes priority based on order.
My Sample df:
file_name fil_name
0 02qbhIPSYiHmV_sample_file-MR-job1 02qbhIPSYiHmV
1 02qbhIPSYiHmV_sample_file-MC-job2 02qbhIPSYiHmV
2 02qbhIPSYiHmV_sample_file-job3 02qbhIPSYiHmV
For me MC
takes the first priority. If MC
is present in file_name
value, take that record. If MC
is not there, then take the record that has MR
in it. If no MC
or MR
, then just take what ever is there in my case just the third row.
I came up with a function like this,
def choose_best_record(df_t):
file_names = df_t['file_name']
for idx, fn in enumerate(file_names):
lw_fn = fn.lower()
if '-mc-' in lw_fn:
get_mc_row = df_t.iloc[idx:idx+1]
print("Returning MC row")
return get_mc_row
else:
if '-mr-' in lw_fn:
get_mr_row = df_t.iloc[idx:idx+1]
print('Returning MR row')
return get_mr_row
else:
normal_row = df_t.iloc[idx:idx+1]
print('Reutrning normal row')
return normal_row
However, this does not behave the way I want. I need MC
(row index 1), instead, it returns MR
row.
If I have my rows in the dataframe in order like this, ...file-MR-job1
, ...file-MR-job1
, ....file-MR-job1
, then it works. How can I change my function to work based on how I need my out put?
Aucun commentaire:
Enregistrer un commentaire