mardi 2 juin 2015

Pull out information from last line from a if else statement within a for loop Python

I don't think this is possible but I figured I would ask just in case. So I am trying to write a memory efficient python program for parsing files that are typically 100+ gigs in size. What I am trying to do is use a for loop to read in a line, split on various characters multiple times and write it all within the same loop.

The trick is that the file has lines that start with "#" which is not important except for the last line that starts with a "#" which is the header of the file. I want to be able to pull information from that last line because it contains the sample names.

for line in seqfile:
line = line.rstrip()
if line.startswith("#"):
    continue (unless its the last line that starts with #)
    SampleNames = lastline[8:-1]
    newheader.write(New header with sample names)
else:
    columns = line.split("\t") 
    then do more splitting
    then write

If this is not possible then the only other alternative I can think of it to store the lines with # (which can still be 5 gigs in size) then go back and write to the beginning of the file which I believe which can't be done directly but if there is a way to do that memory efficiently it would be nice.

Any help would be greatly appreciated.

Thank you

Aucun commentaire:

Enregistrer un commentaire