mercredi 25 août 2021

regular expression: elif statement doesent return desired output

I use regex to split a text string derived from parsing pdfs.

        que = re.compile(r'I ung:(.*\?)', flags = re.DOTALL)
        que_stion = que.search(text)
        
        if re.search('laut:', text):
            vor = re.compile(r'laut:(.*)(Iung:)', flags = re.DOTALL)
            vor_be = vor.search(text)
        elif re.search('Iung:', text):
            topic = re.compile(r'\n \n(.*) \n \n', flags = re.DOTALL | re.MULTILINE)
            topic_resp = topic.search(text)
            vor = re.compile(r'(.*)(Iung:)', flags = re.DOTALL | re.MULTILINE)
            vor_be = vor.search(text)
        elif re.search('Iung:', text):
            topic = re.compile(r' \n\n(.*) \n\n', flags = re.DOTALL | re.MULTILINE)
            topic_resp = topic.search(text)
            vor = re.compile(r' \n\n(.*)(Iung:)', flags = re.DOTALL | re.MULTILINE)
            vor_be = vor.search(text)
        else:
            vor_be = None
    
data.append([que_stion.group(1), vor_be.group(1)])

output:

[['\n\naa', '\n\naa\n\n'], [' \n\naaa', 'aa \t\n\naa \n\naa \n\naa \n\naa \n\naa \naa\naa \n\nzzz \nzzz \nzzz \nzzz \nzzz \n\n']]

desired ouput

[['\n\naa', '\n\naa\n\n'], [' \n\naaa', ' \n\nzzz \nzzz \nzzz \nzzz \nzzz \n\n']]

FYI: the second list item with zzz is a result of the second elif statement.

When i try to adjust the second elif statement, output doesent change or returns Nonetype has no attribute group. Is my mistake that i keep calling .group(1) even though in the second elif statement my desired text is something else? Or is the problem that i use seach while i only want to match once even though ' \n\nzzz \n\n' occurs more than once?

Aucun commentaire:

Enregistrer un commentaire