mardi 24 décembre 2019

Spark Dataframe size check on columns does not work as expected using Scala

I do not want to use foldLeft or withColumn with when over all columns in a dataframe, but want a select as per https://medium.com/@manuzhang/the-hidden-cost-of-spark-withcolumn-8ffea517c015, embellished with an if else statement and cols with vararg. All I want is to replace an empty array column in a Spark dataframe using Scala. I am using size but it never computes the zero (0) correctly.

val resDF2 = aggDF.select(cols.map { col => ( if (size(aggDF(col)) == 0) lit(null) else aggDF(col) ).as(s"$col") }: _*)

if (size(aggDF(col)) == 0) lit(null) does not work here functionally, but it does run and size(aggDF(col)) returns the correct length if I return that.

I am wondering what the silly issue is. Must be something I am obviously overlooking!

Aucun commentaire:

Enregistrer un commentaire