jeudi 31 janvier 2019

Try, Except / If Statement Combination - Missing results

I am comparing one list of universities with several others, finding fuzzy string matches and writing results to a csv. Example of the lists:

data = ["MIT", "Stanford",...]

Data1 = ['MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT)'], ['STANFORD UNIVERSITY'],...

With StackOverflow's help I got as far as:

for uni in data:
hit = process.extractOne(str(uni[1]), data10, scorer = fuzz.token_set_ratio, score_cutoff = 90)
     try:
        if float(hit[1]) >= 94:
            with open(filename, mode='a', newline="") as csv_file:
                fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 10})

    except:
        hit1 = process.extractOne(str(uni[1]), data11, scorer = fuzz.token_set_ratio, score_cutoff = 90)
           try:
              if float(hit1[1]) >= 94:
                  with open(filename, mode='a', newline="") as csv_file:
                        fieldnames = [""]
                        writer = csv.DictWriter("")
                        writer.writerow({""})

... until the last excepts where I include those with scores lower than 94 and end with a "not found":

    except:
  hit12 = process.extractOne(str(uni[1]), data9, scorer = fuzz.token_set_ratio)
    try:
        if float(hit12[1]) < 94:
            with open(filename, mode='a', newline="") as csv_file:
                   fieldnames = [""]
                   writer = csv.DictWriter("")
                   writer.writerow({""})
      except:
          with open(filename, mode='a', newline="") as csv_file:
                fieldnames = [""]
                writer = csv.DictWriter("")
                writer.writerow({""})

However, I am returned only 2854 results as opposed to the 3175 in my original list (which all need to be checked and written to the new csv).

When I throw all my lists together and do my extractOne I do get 3175 results:

scored_testdata = []
for uni in data:
     hit = process.extractOne(str(uni[1]), big_list, scorer = fuzzy.token_set_ratio, score_cutoff = 90)
     scored_testdata.append(hit)
print(len(scored_testdata))

What am I missing here? I get the feeling results returning "None" in the process.extractOne are being dropped for some reason. Any help would be much appreciated.

Aucun commentaire:

Enregistrer un commentaire