I have 3 lists of differents words. A word can be in more than 1 lists. I have also a list of sentences. What I did is : I check if a word at a particular index in my sentence is present in one of the above lists of word and If "yes", then I append the sentence. I used two way.
First way is that:
I write the code for one of the lists of words and I duplicate it but only changing the name of list and the variable where the sentence are append
if (pattern_list_tup[pat][2]):
if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]]):
if sent not in sentence_extract_lexique_1:
sentence_extract_lexique_1.append(sent)
else:
sentence_not_extract_lexique_1.append(sent)
if (pattern_list_tup[pat][2]):
if (span[value_of_attribute[pat]] in lexique_2[pattern_list_tup[pat][1]]):
if sent not in sentence_extract_lexique_1:
sentence_extract_lexique_2.append(sent)
else:
sentence_not_extract_lexique_2.append(sent)
if (pattern_list_tup[pat][2]):
if (span[value_of_attribute[pat]] in lexique_3[pattern_list_tup[pat][1]]):
if sent not in sentence_extract_lexique_1:
sentence_extract_lexique_3.append(sent)
else:
sentence_not_extract_lexique_3.append(sent)
As you can see I duplicate the first line of code and use it for the other lists(lexique). What I did is I decided to print the "union of the three list"; As below ,
Lists = [sentence_extract_lexique_1, sentence_extract_lexique_2, sentence_extract_lexique_3]
all_union = set.union(*map(set, Lists))
print("Union des 4 dictionnaires pour phrases extraites", len(all_union), "\n")
# Union des 4 dictionnaires pour phrases extraites 1003
Then I come up with another way in order to not duplicate the first line : I use the operator "or" as below :
if (pattern_list_tup[pat][2]):
if (span[value_of_attribute[pat]] in lexique_1[pattern_list_tup[pat][1]] or lexique_2[pattern_list_tup[pat][1]] or lexique_3[pattern_list_tup[pat][1]] :
#print(span[value_of_attribute[pat]])
if sent not in sentence:
sentence.append(sent)
else:
sentence_not.append(sent)
After this line of code I decided to print the result but before printing, I used "set" in order to suppress doublons.
print(" phrases totales ", len(sentence))
# phrases totale 1996
sentence = set(sentence)
print("Total après suppression ", len(sentence))
Total après suppression 1556
I was wondering why the results are differents , Using "or "and then deleting doublons should have give the same result as the first way (I believed) . Maybe Someone can help me figure it out why; I use the second one to present my works but afterwards I re-check the code and to improve , I use "or"; If both answer are not the same, does this means my first solution is false or is the second one the true one.
Aucun commentaire:
Enregistrer un commentaire