samedi 2 février 2019

Improve the speed of for loop for readline of very big file: Python

I am trying to improve the speed of my for loop for reading line in very big file. I've two files, I am taking information from first file line by line through for loop and matching to each lines of second file through if statement. Since both the files has millions of lines its taking too much time. I am posting my code here which has it for and if statements, and would be grateful if get valuable suggestion to improve the loop statement to increase the speed of execution.

Thanks

#!/usr/bin/python

#open file1
f1 = open("../../Reapeat_analysis/FIMO/fimo_out/fimo.gff",'r')
#open file2
f2 = open("../BS_Forward.fastq_bismark_pe.CX_report.txt",'r')

f1.seek(0)
f2.seek(0)

#open and save output
fOut = open("output_sample_CG+.txt",'w')

#Reading file1 lines in for loop
for line1 in f1:
    line1 = line1.split('\t')
    s1 = int(line1[3])
    s2 = int(line1[4])
    s0 = str(line1[0])
    count = 0
    percent = 0
    lt = []

    #Reading file2 lines for each file1 line
    for line2 in f2:
        line2 = line2.split("\t")

        #Matching desired condition
        if (s0 == str(line2[0])) and (s1 <= int(line2[1]) <= s2) and (str(line2[5])=="CG") and (str(line2[2])=="+"):
            lt.append(line2)
            count = count + 1

    #saving each matched conditions
    fOut.write(str(s1) + "-" + str(s2) + ":" + str(s0) + "\t" + str(count) + "\t" + str(lt))
    f2.seek(0)
fOut.close()

Aucun commentaire:

Enregistrer un commentaire