I use regex to split a text string derived from parsing pdfs.
que = re.compile(r'I ung:(.*\?)', flags = re.DOTALL)
que_stion = que.search(text)
if re.search('laut:', text):
vor = re.compile(r'laut:(.*)(Iung:)', flags = re.DOTALL)
vor_be = vor.search(text)
elif re.search('Iung:', text):
topic = re.compile(r'\n \n(.*) \n \n', flags = re.DOTALL | re.MULTILINE)
topic_resp = topic.search(text)
vor = re.compile(r'(.*)(Iung:)', flags = re.DOTALL | re.MULTILINE)
vor_be = vor.search(text)
elif re.search('Iung:', text):
topic = re.compile(r' \n\n(.*) \n\n', flags = re.DOTALL | re.MULTILINE)
topic_resp = topic.search(text)
vor = re.compile(r' \n\n(.*)(Iung:)', flags = re.DOTALL | re.MULTILINE)
vor_be = vor.search(text)
else:
vor_be = None
data.append([que_stion.group(1), vor_be.group(1)])
output:
[['\n\naa', '\n\naa\n\n'], [' \n\naaa', 'aa \t\n\naa \n\naa \n\naa \n\naa \n\naa \naa\naa \n\nzzz \nzzz \nzzz \nzzz \nzzz \n\n']]
desired ouput
[['\n\naa', '\n\naa\n\n'], [' \n\naaa', ' \n\nzzz \nzzz \nzzz \nzzz \nzzz \n\n']]
FYI: the second list item with zzz is a result of the second elif statement.
When i try to adjust the second elif statement, output doesent change or returns Nonetype has no attribute group. Is my mistake that i keep calling .group(1) even though in the second elif statement my desired text is something else? Or is the problem that i use seach while i only want to match once even though ' \n\nzzz \n\n' occurs more than once?
Aucun commentaire:
Enregistrer un commentaire