mercredi 15 avril 2020

transformations data Pyspark python

hot to read statement python Python Functions?

I want to change data(Million Records) l = 0, m = 1, h = 2, c= 3 ,cause I'll find average later. I use 'Order Priority' as id but string can't reduceByKey.

My data looks like the following:

+--------------+------------+
|Order Priority|  Units Sold|
+--------------+------------+
|M             |1593        | 
|M             |4611        |
|C             |7676        | 
|H             |4790        | 
|L             |3973        |

L = Low, M = Medium, H = High, C= Critical

This my code.py:

def parseLine(line):
    fields = line.split(',')
    priority = (fields[0])
    sold = float(fields[1])
    return (priority, sold)


lines = sc.textFile("file:///SparkCourse/project/1MillSalesRecords.csv", 4)
rdd = lines.map(parseLine)

print(rdd.take(2))

result:

[('M', 1593.0), ('M', 4611.0)]

Aucun commentaire:

Enregistrer un commentaire