jeudi 2 août 2018

Why does this for loop seem to break after a true if statement?

All similar questions I've found involve wanting to break a for loop, whereas I DON'T want mine to break.

I have a set of data that consists of presentation titles (strings) and each is paired with an attendance value. I want to break each title into its constituent words and have each word paired with the attendance value that was originally paired with the title. I did that, but the complication is that some are not single words - I have some (pre-determined, contained in the set wordpairsset which I imported from a csv) two-word pairs that I would like to count as one word (e.g. "hydraulic fracturing," "Eagle Ford").

The code I have here almost works, except when running through each title, the for loop that looks for two-word pairs (for i in pairs(keyword_lists)) only runs until it finds a two-word pair that's part of the set - then it jumps out and goes on to the next for loop (for keyword in keyword_list). So everything works great for titles that only have one of those special two-word pairs in them, but if they have two two-word pairs then the second one doesn't get caught and is instead filed as two separate words.

I found this question and thought maybe it was somehow an iterable vs. generator issue, but I tried rewriting for i in pairs(keyword_list) in an iterable form and it didn't help (and did run more slowly).

I know you're not supposed to use index variables in Python, but I couldn't figure out how to use enumerate() here.

wordpairsset = set()
for row in wordpairsReader:
    wordpairsset.add(row[0])

def pairs(lst):
    return zip(lst,lst[1:]+[lst[0]])

for row in attendanceReader:
    keyword_list = [x.strip() for x in row[0].split()]

    # First go through and look for common two-word pairs; write to
    # output file with associated attendance data and remove from string
    index = 0
    for i in pairs(keyword_list):
        pair = i[0] + ' ' + i[1]
        if pair in wordpairsset: # test pairs for membership in set
            outputWriter.writerow([pair, row[1]])
            del keyword_list[index]
            if index <= len(keyword_list)-1:
                del keyword_list[index]
        if index <= len(keyword_list)-1:
            index += 1

    # Then go through and write remaining single words with their associated
    # attendance data
    for keyword in keyword_list:
        outputWriter.writerow([keyword, row[1]])

Aucun commentaire:

Enregistrer un commentaire