jeudi 10 septembre 2020

Avoiding duplicating line of code by using "or" operator" does not give the same results as when duplicating the line

I have 3 lists of differents words. A word can be in more than 1 lists. I have also a list of sentences. What I did is : I check if a word at a particular index in my sentence is present in one of the above lists of word and If "yes", then I append the sentence. I used two way.

First way is that:

I write the code for one of the lists of words and I duplicate it but only changing the name of list and the variable where the sentence are append

if (pattern_list_tup[pat][2]):
                    if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]]):
                        if sent not in sentence_extract_lexique_1:
                             sentence_extract_lexique_1.append(sent)                            
                    else:
                        sentence_not_extract_lexique_1.append(sent)

if (pattern_list_tup[pat][2]):
                    if (span[value_of_attribute[pat]] in lexique_2[pattern_list_tup[pat][1]]):
                        if sent not in sentence_extract_lexique_1:
                             sentence_extract_lexique_2.append(sent)                            
                    else:
                        sentence_not_extract_lexique_2.append(sent)

if (pattern_list_tup[pat][2]):
                    if (span[value_of_attribute[pat]] in lexique_3[pattern_list_tup[pat][1]]):
                        if sent not in sentence_extract_lexique_1:
                             sentence_extract_lexique_3.append(sent)                            
                    else:
                        sentence_not_extract_lexique_3.append(sent)

As you can see I duplicate the first line of code and use it for the other lists(lexique). What I did is I decided to print the "union of the three list"; As below ,

Lists = [sentence_extract_lexique_1, sentence_extract_lexique_2, sentence_extract_lexique_3]
    
    all_union = set.union(*map(set, Lists))
    print("Union des 4 dictionnaires pour phrases extraites", len(all_union), "\n")

# Union des 4 dictionnaires pour phrases extraites  1003     

Then I come up with another way in order to not duplicate the first line : I use the operator "or" as below :

 if (pattern_list_tup[pat][2]):
                    if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]] or lexique_2[pattern_list_tup[pat][1]] or lexique_3[pattern_list_tup[pat][1]] :
                        #print(span[value_of_attribute[pat]])
                        if sent not in sentence:
                            sentence.append(sent)                     
                    else:
                        sentence_not.append(sent)

After this line of code I decided to print the result but before printing, I used "set" in order to suppress doublons.

print(" phrases totales ", len(sentence))

# phrases totale 1996

sentence = set(sentence)
print("Total après suppression ", len(sentence))

Total après suppression 1556

I was wondering why the results are differents , Using "or "and then deleting doublons should have give the same result as the first way (I believed) . Maybe Someone can help me figure it out why; I use the second one to present my works but afterwards I re-check the code and to improve , I use "or"; If both answer are not the same, does this means my first solution is false or is the second one the true one.

Aucun commentaire:

Enregistrer un commentaire