I'd like to parse some files and write the matched regex to a new file. Now, the files include some different formulations and I need to check if any of my regex matches. If so, I want to use the working regex and write the file to a new folder, BUT I if none of my regex matches, I want to see the according file-name.. (appended to a list) .. I dont know how to combine the if, elif and else statements accordingly..
So I read about it and tried:
some_folder = "C:/Users/Folder/"
reg1 = r"some regex"
reg2 = r"some regex2"
error_list = []
for file in files:
with open(file,'r', encoding='utf-8') as in_file:
with open(some_folder+name,'w',encoding='utf-8') as n_file:
content = in_file.read().lower()
if re.match(reg1, content, re.IGNORECASE | re.DOTALL | re.MULTILINE):
matches_reg1 = re.findall(reg1, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)
result = max(matches_reg1, key=len)
result = str(result).replace('\n', '')
n_file.write(result)
elif re.match(reg2, content, re.IGNORECASE | re.DOTALL | re.MULTILINE):
matches_reg2 = re.findall(reg2, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)
result = max(matches_reg2, key=len)
result = str(result).replace('\n', '')
n_file.write(result)
else:
error_list.append(name)
print("ERROR: ", name)
But this absolutly does not work.. what did work better was this.. but it seems inefficient and does not show the error files, just the ones for the first regex:
for file in files:
with open(file,'r', encoding='utf-8') as in_file:
with open(some_folder+name,'w',encoding='utf-8') as n_file:
content = in_file.read().lower()
matches_reg1 = re.findall(reg1, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)
matches_reg2 = re.findall(reg2, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)
if matches_reg1:
result = max(matches_reg1, key=len)
result = str(result).replace('\n', '')
n_file.write(result)
if matches_reg2:
result = max(matches_reg2, key=len)
result = str(result).replace('\n', '')
n_file.write(result)
else:
error_list.append(name)
print("ERROR: ", name)
... I also considered.. but honestly can someone explain a efficient way to deal with this?
matches_reg1 = re.findall(reg1,..)
if matches_reg1:
...
elif matches_reg1:
match = re.findall(reg2, ...)
Aucun commentaire:
Enregistrer un commentaire