I am trying to make this loop work, where I compare the value of a approx_count_distinct to a threshold. I would like to execute the if statement when the distinct_count is <2. but it always returns "NULL", even though when I print approx I get the right results (that are smaller than 2). What am I doing wrong?
for col in s:
approx = df.agg(approx_count_distinct(col).alias("count"))
if approx.collect()[0] < 2:
print(col)
Aucun commentaire:
Enregistrer un commentaire