lundi 14 décembre 2015

Nested loops iterating on a single file

I want to delete some specific lines in a file. The part I want to delete is enclosed between two lines (that will be deleted too), named STARTING_LINE and CLOSING_LINE. If there is no closing line before the end of the file, then the operation should stop.

Example:

...blabla...
[Start] <-- # STARTING_LINE
This is the body that I want to delete
[End] <-- # CLOSING_LINE
...blabla...

I came out with three different ways to achieve the same thing, but I am wondering which one is the best.

1. A lot of if conditions (just one for loop):

def delete_lines(filename):
    with open(filename, 'r+') as my_file:
        text = ''
        found_start = False
        found_end = False

        for line in my_file:
            if not found_start and line.strip() == STARTING_LINE.strip():
                found_start = True
            elif found_start and not found_end:
                if line.strip() == CLOSING_LINE.strip():
                    found_end = True
                continue
            else:
                print(line)
                text += line

        # Go to the top and write the new text
        my_file.seek(0)
        my_file.truncate()
        my_file.write(text)  

2. Nested for loops on the open file:

def delete_lines(filename):
    with open(filename, 'r+') as my_file:
        text = ''
        for line in my_file:
            if line.strip() == STARTING_LINE.strip():
                # Skip lines until we reach the end of the function
                # Note: the next `for` loop iterates on the following lines, not
                # on the entire my_file (i.e. it is not starting from the first
                # line). This will allow us to avoid manually handling the
                # StopIteration exception.
                found_end = False
                for function_line in my_file:
                    if function_line.strip() == CLOSING_LINE.strip():
                        print("stop")
                        found_end = True
                        break
                if not found_end:
                    print("There is no closing line. Stopping")
                    return False
            else:
                text += line

        # Go to the top and write the new text
        my_file.seek(0)
        my_file.truncate()
        my_file.write(text)

3. while True and next() (with StopIteration exception)

def delete_lines(filename):
    with open(filename, 'r+') as my_file:
        text = ''
        for line in my_file:
            if line.strip() == STARTING_LINE.strip():
                # Skip lines until we reach the end of the function
                while True:
                    try:
                        line = next(my_file)
                        if line.strip() == CLOSING_LINE.strip():
                            print("stop")
                            break
                    except StopIteration as ex:
                        print("There is no closing line.")
            else:
                text += line

        # Go to the top and write the new text
        my_file.seek(0)
        my_file.truncate()
        my_file.write(text)

It seems that these three implementations achieve the same result. So...

Question: which one should I use? Which one is the most Pythonic? Which one is the most efficient? Which one would you use?

Is there a better solution instead?


Edit: I tried to evaluate the methods on a big file using timeit (and removing the last three lines of code, so that I do not modify the file).

t_if = timeit.Timer("delete_lines_if('test.txt')", "from __main__ import delete_lines_if")
t_for = timeit.Timer("delete_lines_for('test.txt')", "from __main__ import delete_lines_for")
t_while = timeit.Timer("delete_lines_while('test.txt')", "from __main__ import delete_lines_while")

print(t_if.repeat(3, 2000))
print(t_for.repeat(3, 2000))
print(t_while.repeat(3, 2000))

Result:

# Using IF statements:
[5.249358177185059, 5.226311922073364, 5.234260082244873]
# Using nested FOR:
[4.798391103744507, 4.613671064376831, 4.6676459312438965]
# Using while:
[4.635796070098877, 4.649979114532471, 4.6590471267700195]

Aucun commentaire:

Enregistrer un commentaire