Good morning ,
I have a question related to the use of the operator "or". I have 8 lists of postive and negative words for 4 dictionnaries :
list_pos_1 & list_neg_1 for lexicon 1
list_pos_2 & list_neg_2 for lexicon 2
list_pos_3 & list_neg_3 for lexicon 3
list_pos_4 & list_neg_4 for lexicon 4
So I look into a list of sentence and for each sentence I checked if the word in the sentence present before or after a connector (I have a list of connector) in the sentence is present in any of the above list;
And then I applied some rules in order to get the polarity of the sentence based in the number of words present in the sentence accord ing to 3 cases :
first case : I took in consideration the positive and negative words found before the connector second : I took in consideration the positive and negative words found after the connector third : I took in consideration the positive and negative words found before and after the connector
pos >/</= neg {connector} pos </>/= neg
For each case and for each rules I should get a list of sentences.
I print the result for each lexicon and for all the lexicon at the same time and also for two dictionnary at the same time (lexicon 1 & 2, lexicon 3 & 4)
I observed the answers are all differents; I explain :
If I check in the lexicon 1 and 2 seperately, I get differents numbers of sentences which I returned based on the polarity but if I combined the search into two dictionnaries at the same time ; I get another result but the result does not correspond at the total of the result I get from lexicon 1 & 2 seperately and I use the operator "or" to search in either "lexicon 1" or "lexicon2" so I was expecting that the result will be the combination of the result of the two dictionnary that I get seperately;
for lexicon 1 : I get 44 sentences wich are negative
for lexicon 2 : I get 82 sentences which are negative
but if I used "lexicon 1 or lexicon 2 " : I get 91 , I thought I will get 126 ( which is total of what I get for lexicon 1 & 2 )
Maybe it is correct and I did not get the how "or" is functionning, below the script and the answers
# v is the sentence which is already splited , I checked every string/w in v to see if it is # present in my list of positive or negative word of my lexicon and I did it for the 4 lexicons # and also for the 4 lexicon at the same time
v = [word.lemma_.lower().strip() for word in mytokens if word.pos_ != "PUNCT" and word.pos_ != "SPACE"]
for i, j in enumerate(v):
if j == 'mais' or j == 'pourtant' or j == 'néanmoins' or j == 'cependant' or j == 'toutefois' or j == 'dès' or j == 'bien':
#print(j , i )
liste_index_pivot.append(i)
#print(liste_index_pivot)
if len(liste_index_pivot)== 0:
elts_sans_w_pivot.append(k)
else :
for w in v:
ind_pivot = max(liste_index_pivot) # index of the connector I took the index of the connector which is high and discard the others; It is in case I have lots of connectors in the sentenc# words before the index of the connector
#print(ind_pivot)
ind = v.index(w)
if ind < ind_pivot: # look in all the negative lists of the 4 lexicon
if w in liste_neg_F or w in liste_neg_D or w in liste_neg_A or w in liste_neg_P:
d_neg_av_t.append(w)
elif w in liste_pos_F or w in liste_pos_D or w in liste_pos_A or w in liste_pos_P:
d_pos_av_t.append(w)
else:
d_0_av_t.append(w)
if w in liste_neg_F or w in liste_neg_D : # look in two list of two differents lists
d_neg_av_fd.append(w)
elif w in liste_pos_F or w in liste_pos_D :
d_pos_av_fd.append(w)
else:
d_0_av_fd.append(w)
if w in liste_neg_F : # look in the list of one particular dictionnary
d_neg_av_f.append(w)
elif w in liste_pos_F:
d_pos_av_f.append(w)
else:
d_0_av_f.append(w)
if w in liste_neg_D : # look in the list of one particular dictionnary
d_neg_av_d.append(w)
elif w in liste_pos_D :
d_pos_av_d.append(w)
else:
d_0_av_d.append(w)
else:
None
# Collecting the "len" of positive words and negative words to do the rules
len_d_pos_av_t =len(d_pos_av_t)
len_d_neg_av_t =len(d_neg_av_t)
len_d_pos_av_fd =len(d_pos_av_fd)
len_d_neg_av_fd =len(d_neg_av_fd)
len_d_pos_av_f =len(d_pos_av_f)
len_d_neg_av_f =len(d_neg_av_f)
len_d_pos_av_d =len(d_pos_av_d)
len_d_neg_av_d =len(d_neg_av_d)
# Rules for each dictionnary
if len_d_pos_av_t >= len_d_neg_av_t :
if k not in Liste_M_t:
Liste_M_t.append(k)
elif len_d_pos_av_t <= len_d_neg_av_t :
if k not in Liste_F_t:
Liste_F_t.append(k)
else:
if k not in Liste_A_t:
Liste_A_t.append(k)
if len_d_pos_av_fd >= len_d_neg_av_fd :
if k not in Liste_M_fd:
Liste_M_fd.append(k)
elif len_d_pos_av_fd <= len_d_neg_av_fd :
if k not in Liste_F_fd:
Liste_F_fd.append(k)
else:
if k not in Liste_A_fd:
Liste_A_fd.append(k)
########################################
if len_d_pos_av_f >= len_d_neg_av_f :
if k not in Liste_M_f:
Liste_M_f.append(k)
elif len_d_pos_av_f <= len_d_neg_av_f :
if k not in Liste_F_f:
Liste_F_f.append(k)
else:
if k not in Liste_A_f:
Liste_A_f.append(k)
############################################
if len_d_pos_av_d >= len_d_neg_av_d :
if k not in Liste_M_d:
Liste_M_d.append(k)
elif len_d_pos_av_d <= len_d_neg_av_d :
if k not in Liste_F_d:
Liste_F_d.append(k)
else:
if k not in Liste_A_d:
Liste_A_d.append(k)
Results :
Méthode polarité : avant
*** Lexicon 1 & 2 *****
Liste F : 91 Liste M : 339 Liste A : 0
*** All lexicon *****
Liste_F: 9 Liste M: 421 Liste A : 0
*** Lexicon 1 *****
Liste_F: 44 Liste M: 386 Liste A : 0
*** Lexicon 2 *****
Liste_F: 82 Liste M: 348 Liste A : 0
As you can see the result are not really similar particularly
So I was wondering if I was using the "or" operator incorrectly or is the "for", "if" and "elif" which are misplaced.
Aucun commentaire:
Enregistrer un commentaire