jeudi 5 août 2021

How to use condition inside foldLeft in spark Scala?

`Error : Type mismatch. Required: (sql.DataFrame, String) => sql.DataFrame, found: (sql.DataFrame, String) => Any

I am trying to traverse through all the columns in dataframe. so I have used foldLeft.Need to replace the data based on the following conditions: For eg: If the column type is of Integer, perform one operation and if column type is of another type, need to perform another operation..but getting type mismatch error if I use conditions inside foldLeft. Please someone assist.`

val actualDF = nonullDF
    .columns
    .foldLeft(nonullDF) { (memoDF, colName) =>
      if (memoDF.schema("colName").dataType == IntegerType) {
        memoDF.withColumn(
          colName,
          when(col("colName") === "?",
            (memoDF.select(avg("colName")).head().getInt(0)))
            .otherwise(col("colName")))
      }
      else if (memoDF.schema("colName").dataType == DoubleType) {
        memoDF.withColumn(
          colName,
          when(col("colName") === "?",
            (memoDF.select(avg("colName")).head().getDouble(0)))
            .otherwise(col("colName")))
      }
      else if (memoDF.schema("colName").dataType == StringType) {
        memoDF.withColumn(
          colName,
          when(col("colName") === "?", memoDF.groupBy(col("colName")).count().orderBy(desc("count")).first()(0))
            .otherwise(col("colName")))
      }
    }```

Aucun commentaire:

Enregistrer un commentaire