mardi 9 mai 2017

How to generate lines of code automatically in python scripts

I have python file called test.py. In this file I will execute some pyspark commands.

#!/usr/bin/env python

import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext

conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

# create a data frame from hive tables
df=sqlContext.table("testing.test")

# register the data frame as temp table
df.registerTempTable('mytempTable')

# find number of records in data frame
records = df.count()

print "records='%s'" %records


if records < 1000000:
 sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable".format(hivedb,table))
else:
 sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable where id <= 1000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 1000000 and id <= 2000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 2000000 and id <= 3000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 3000000 and id <= 4000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 4000000 and id <= 5000000".format(hivedb,table))
 and so on till the last million

In the if-else statement after else The code I have written manually.

I want to generate this part of code in the script automatically.

How to generate similar lines of code until the last million in records?

I am new to python and still learning it. Is there a way to simplify the code

Aucun commentaire:

Enregistrer un commentaire