vendredi 16 décembre 2016

Python Regex TypeError: 'bool' object is not iterable & "If is not None" statement

1st Problem

I have a regex statement that ran fine as one pattern but since i added in a second pattern and joined them together i now receive a bool type error the second pattern is identical to the first one apart from different variable name. I personally am only starting to build up knowledge with Regex and am probably still in beginner level for identifying issues I just can't seem to understand why the pattern runs as a singular without error and this won't

2nd Problem

I am trying to write an if statement that works by this logic.

Run code to search through a document which is a chat between two people so for each sentence i check if a word from ListA appears in it if it does then check to see if there are any words from ListB also in sentence if yes print if No then move on.

The problem is what it's doing is printing the sentences even if there is only one hit and the then prints other word as None like this:

words matched I and None on Line: I like to sleep in early

i tried multiple attempts such as:

if resultlex is not None and resultcat is not None:
if result.group(1) !=0 and result.group(2) !=0:

Here is my full function where these issues occur:

from collections import Counter
from SpeechActs.Categories import *
from Readfiles import *

# Speech Acts
CategoryGA = GA
CategoryPI = PersonalInfo
# ----------------------

# Word hit counters
CategoryHits = []
LexiconHits = []
# ----------------------

# unsure if used at this point
cleansedLex = []
# ----------------------

# Lists to hold lines where words have been matched
matchedCatlines = []
matchedLexlines = []
TestLine = []

def languagemodel():
    WordHit = None
    for line in cleanChat.values():
        for lword in cleanLex:
            for cword in CategoryGA:
                for section in line:
                    if any(lword in section and cword in section for lword in cleanLex for cword in
                           CategoryGA):  # searches each section to find words matching words stored in cleanLex
                        WordHit = False
                        patterns = r"\b(" + re.escape(lword) + r")\b", r"\b(" + re.escape(cword) + r")\b"  # pattern to match containing Lword
                        pattern = "|".join(patterns)  # joins the above patterns into one
                        if re.search(pattern, section, re.IGNORECASE):  # Running pattern
                            result = re.search(pattern, section,
                                               re.IGNORECASE)  # if match it displays match word with full line
                            for lword in cleanLex, cword in CategoryGA:
                                resultlex = result.group(1)
                                resultcat = result.group(2)
                            if resultlex is not None and resultcat is not None:
                                    LexiconHits.append(resultlex)
                                    CategoryHits.append(resultcat)
                                    WordHit = True
                                    if section not in TestLine:
                                        TestLine.append(section)
                                        print("words matched %s and %s on Line:    %s " % (resultlex, resultcat, section))

                    elif any(lword in section and cword in section for lword in cleanLex for cword in CategoryPI):
                        if len(lword) != 0 and len(cword) != 0:
                            if section not in TestLine:
                                TestLine.append(section)
                                # print("words matched %s and %s on Line:    %s " % (lword, cword, section))


languagemodel()

Traceback Error:

Traceback (most recent call last):
  File "C:/Users/Lewis Collins/PycharmProjects/Test/main.py", line 115, in <module>
    languagemodel()
  File "C:/Users/Lewis Collins/PycharmProjects/Test/main.py", line 92, in languagemodel
    patterns = r"\b(" + re.escape(lword) + r")\b", r"\b(" + re.escape(cword) + r")\b"  # pattern to match containing Lword
  File "C:\Users\Lewis Collins\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 258, in escape
    for c in pattern:
TypeError: 'bool' object is not iterable

All help and advice is appreciated, I've tried to be as clear as possible on what I'm trying to do and achieve and the problems that are occurring, If you feel I'm missing something out in description please tell me.

Thanks

Aucun commentaire:

Enregistrer un commentaire