All similar questions I've found involve wanting to break a for loop, whereas I DON'T want mine to break.
I have a set of data that consists of presentation titles (strings) and each is paired with an attendance value. I want to break each title into its constituent words and have each word paired with the attendance value that was originally paired with the title. I did that, but the complication is that some are not single words - I have some (pre-determined, contained in the set wordpairsset which I imported from a csv) two-word pairs that I would like to count as one word (e.g. "hydraulic fracturing," "Eagle Ford").
The code I have here almost works, except when running through each title, the for loop that looks for two-word pairs (for i in pairs(keyword_lists)) only runs until it finds a two-word pair that's part of the set - then it jumps out and goes on to the next for loop (for keyword in keyword_list). So everything works great for titles that only have one of those special two-word pairs in them, but if they have two two-word pairs then the second one doesn't get caught and is instead filed as two separate words.
I found this question and thought maybe it was somehow an iterable vs. generator issue, but I tried rewriting for i in pairs(keyword_list) in an iterable form and it didn't help (and did run more slowly).
I know you're not supposed to use index variables in Python, but I couldn't figure out how to use enumerate() here.
wordpairsset = set()
for row in wordpairsReader:
wordpairsset.add(row[0])
def pairs(lst):
return zip(lst,lst[1:]+[lst[0]])
for row in attendanceReader:
keyword_list = [x.strip() for x in row[0].split()]
# First go through and look for common two-word pairs; write to
# output file with associated attendance data and remove from string
index = 0
for i in pairs(keyword_list):
pair = i[0] + ' ' + i[1]
if pair in wordpairsset: # test pairs for membership in set
outputWriter.writerow([pair, row[1]])
del keyword_list[index]
if index <= len(keyword_list)-1:
del keyword_list[index]
if index <= len(keyword_list)-1:
index += 1
# Then go through and write remaining single words with their associated
# attendance data
for keyword in keyword_list:
outputWriter.writerow([keyword, row[1]])
Aucun commentaire:
Enregistrer un commentaire