vendredi 2 novembre 2018

Pyspark: compare values and if true execute statement

I am trying to make this loop work, where I compare the value of a approx_count_distinct to a threshold. I would like to execute the if statement when the distinct_count is <2. but it always returns "NULL", even though when I print approx I get the right results (that are smaller than 2). What am I doing wrong?

for col in s:
    approx = df.agg(approx_count_distinct(col).alias("count"))
    if approx.collect()[0] < 2:
        print(col)

Aucun commentaire:

Enregistrer un commentaire