Need to reduce the computation for the following python code which contains multiple if else statements. The code runs on a DataBricks so I'm open to Pyspark Solutions as well. Currently this code takes more than 1 hour to run. So any help would be appreciated.
unique_list_code
: List of Unique code from concat_df['C_Code']
column used to filter rows of dataframe containing the code.
concat_df
:Pandas DataFrame with 4 million records
unique_list_code = list(concat_df['C_Code'].unique())
MC_list =[]
SN_list =[]
AN_list = []
Nothing_list =[]
for i in range(0,len(unique_list_code)):
print(unique_list_code[i])
code_filtered_df = concat_df[concat_df['C_Code'] == unique_list_code[i]]
#SN_Filter:
SN_filter = code_filtered_df[(code_filtered_df['D_Type'] == 'SN') & (code_filtered_df['Comm_P'] == 'P-mail')]
if len(SN_filter)>0:
print("Found SN")
SN_list.append(unique_list_code[i])
clean_up(SN_filter)
else:
#AN_Filter
AN_filter = code_filtered_df[(code_filtered_df['D_Type'] == 'AN') & (code_filtered_df['Comm_P'] == 'P-mail')]
if len(AN_filter)>0:
print("Found AN")
AN_list.append(unique_list_code[i])
clean_up(AN_filter)
else:
#MC_Check
MF_filter = code_filtered_df[code_filtered_df['MC_Flag'] =='Y' ]
MF_DNS_filter = MF_filter[~(((MF_filter['D_Type'] == 'AN')| (MF_filter['D_Type'] =='SN')) & (MF_filter['Comm_P'] == 'DNS'))]
if len(MF_DNS_filter)>0:
print("Found MC")
MC_list.append(unique_list_code[i])
clean_up(MF_DNS_filter)
else:
print("Nothing Found")
Nothing_list.append(unique_list_code[i])
```
Aucun commentaire:
Enregistrer un commentaire