I have below df which I have split into two functionalities 1) to filter accounts and 2) perform the operations Query: The second operation needs to be completed only for accounts mentioned in df;it basically if these accounts do next operations else leave it like that.
How can I combine these two sets into one if and do condition? I am deleting other accounts which I want to avoid
df = df1.where((f.col('Account') != "Acc1") |
(f.col('Account') != "Acc2") |
(f.col('Account') != "Acc3") |
(f.col('Account') != "Acc4") |
(f.col('Account') != "Acc4") |
(f.col('Account') != "Acc5"))
from pyspark.sql import functions as F
from pyspark.sql.window import Window
w1=Window().orderBy("mono_id")
df2 = df\
.withColumn("mono_id", F.monotonically_increasing_id())\
.withColumn("Price1", (col('rate') * col('amt1')))\
.withColumn("Price2", (col('rate') * col('amt2')))\
.withColumn("price3", lag(col('amt3')))
Aucun commentaire:
Enregistrer un commentaire