dimanche 19 avril 2020

Iterate through each column in a dataframe and perform a compare

I want to create a compare function from table to table on a dataframe and I want to see if I can create a quicker process for the if statement. I want to be able to iterate through each column in the dataframe and I'm curious if I can use a column position or something instead of making a configuration for each column name. When I have 100 columns I don't want to declare each column name in the conf file, so can someone please point me in the direction of a quicker method?

Conf File:

matching_col1= member_id
matching_col2= activation_idn
matching_col3= addition_dt

Actual Code:

 val mismatches_df_1 = Df1_renamed_matching.except(Df2_renamed_matching)
    if (mismatches_df_1.count() > 0) {

if(DF1_with_err_cols.matching_col1 != DF2_with_err_cols.matching_col1)
{insert into mismatches_df_1 VALUES (DF1_with_err_cols.matching_col1 as matching_col1, ERR_COLUMN                         
= matching_col1, ERR_VALUE_SOURCE = DF1_with_err_cols.matching_col1, ERR_DESCRIPTION = 
matching_col1 + " does not match value in " + source_db_jdbc_table_name2  )}
else{ insert into mismatches_df_1 VALUES(DF1_with_err_cols.matching_col1 as matching_col1) 
  ....
if(DF1_with_err_cols.matching_col14 != DF2_with_err_cols.matching_col14)
{insert into mismatches_df_1 VALUES (DF1_with_err_cols.matching_col14 as matching_col14, 
ERR_COLUMN = matching_col14, ERR_VALUE_SOURCE = DF1_with_err_cols.matching_col14, ERR_DESCRIPTION 
= matching_col14 + " does not match value in " + source_db_jdbc_table_name2  )}
else{ insert into mismatches_df_1 VALUES(DF1_with_err_cols.matching_col14 as matching_col14) }

so instead of doing this 1-100 times per table/dataframe is there a way to create a loop? or for each type of function based off the column's position?

Any help would be appreciated!

Aucun commentaire:

Enregistrer un commentaire