mardi 9 mai 2017

How to use if statement to generate similar lines of code in python

I have python file called test.py. In this file I will execute some pyspark commands.

#!/usr/bin/env python

import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext

conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

# create a data frame from hive tables
df=sqlContext.table("testing.test")

# register the data frame as temp table
df.registerTempTable('mytempTable')

# find number of records in data frame
records = df.count()

print "records='%s'" %records

Now I want to do

if records < 1000000
then do
sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable".format(hivedb,table))    


if records > 1000000

then do 
sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable where id <= 1000000".format(hivedb,table))
sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 1000000 and id <= 2000000".format(hivedb,table))
sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 2000000 and id <= 3000000".format(hivedb,table))
sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 3000000 and id <= 4000000".format(hivedb,table))
sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 4000000 and id <= 5000000".format(hivedb,table))
sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 5000000 and id <= 6000000".format(hivedb,table))
and so on until the last million record

How to use if statement in this script.

How to generate similar lines of code until the last million in records?

As you can see there is a lot of manual work involved.

I am new to python and still learning it. Is there a way to simplify the code

Aucun commentaire:

Enregistrer un commentaire