I want to create a compare function from table to table on a dataframe and I want to see if I can create a quicker process for the if statement. I want to be able to iterate through each column in the dataframe and I'm curious if I can use a column position or something instead of making a configuration for each column name. When I have 100 columns I don't want to declare each column name in the conf file, so can someone please point me in the direction of a quicker method?
Conf File:
matching_col1= member_id
matching_col2= activation_idn
matching_col3= addition_dt
Actual Code:
val mismatches_df_1 = Df1_renamed_matching.except(Df2_renamed_matching)
if (mismatches_df_1.count() > 0) {
if(DF1_with_err_cols.matching_col1 != DF2_with_err_cols.matching_col1)
{insert into mismatches_df_1 VALUES (DF1_with_err_cols.matching_col1 as matching_col1, ERR_COLUMN
= matching_col1, ERR_VALUE_SOURCE = DF1_with_err_cols.matching_col1, ERR_DESCRIPTION =
matching_col1 + " does not match value in " + source_db_jdbc_table_name2 )}
else{ insert into mismatches_df_1 VALUES(DF1_with_err_cols.matching_col1 as matching_col1)
....
if(DF1_with_err_cols.matching_col14 != DF2_with_err_cols.matching_col14)
{insert into mismatches_df_1 VALUES (DF1_with_err_cols.matching_col14 as matching_col14,
ERR_COLUMN = matching_col14, ERR_VALUE_SOURCE = DF1_with_err_cols.matching_col14, ERR_DESCRIPTION
= matching_col14 + " does not match value in " + source_db_jdbc_table_name2 )}
else{ insert into mismatches_df_1 VALUES(DF1_with_err_cols.matching_col14 as matching_col14) }
so instead of doing this 1-100 times per table/dataframe is there a way to create a loop? or for each type of function based off the column's position?
Any help would be appreciated!
Aucun commentaire:
Enregistrer un commentaire