I don't think this is possible but I figured I would ask just in case. So I am trying to write a memory efficient python program for parsing files that are typically 100+ gigs in size. What I am trying to do is use a for loop to read in a line, split on various characters multiple times and write it all within the same loop.
The trick is that the file has lines that start with "#" which is not important except for the last line that starts with a "#" which is the header of the file. I want to be able to pull information from that last line because it contains the sample names.
for line in seqfile:
line = line.rstrip()
if line.startswith("#"):
continue (unless its the last line that starts with #)
SampleNames = lastline[8:-1]
newheader.write(New header with sample names)
else:
columns = line.split("\t")
then do more splitting
then write
If this is not possible then the only other alternative I can think of it to store the lines with # (which can still be 5 gigs in size) then go back and write to the beginning of the file which I believe which can't be done directly but if there is a way to do that memory efficiently it would be nice.
Any help would be greatly appreciated.
Thank you
Aucun commentaire:
Enregistrer un commentaire