lundi 2 septembre 2019

Conditional writing and parsing of files (if elif else)

I'd like to parse some files and write the matched regex to a new file. Now, the files include some different formulations and I need to check if any of my regex matches. If so, I want to use the working regex and write the file to a new folder, BUT I if none of my regex matches, I want to see the according file-name.. (appended to a list) .. I dont know how to combine the if, elif and else statements accordingly..

So I read about it and tried:

some_folder = "C:/Users/Folder/"
reg1 = r"some regex"
reg2 = r"some regex2"

error_list = []

for file in files:

  with open(file,'r', encoding='utf-8') as in_file:     
      with open(some_folder+name,'w',encoding='utf-8') as n_file:
        content = in_file.read().lower()
        if re.match(reg1, content, re.IGNORECASE | re.DOTALL | re.MULTILINE):
            matches_reg1 = re.findall(reg1, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)
            result = max(matches_reg1, key=len)
            result = str(result).replace('\n', '')
            n_file.write(result)
        elif re.match(reg2, content, re.IGNORECASE | re.DOTALL | re.MULTILINE):
            matches_reg2 = re.findall(reg2, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)
            result = max(matches_reg2, key=len)
            result = str(result).replace('\n', '')
            n_file.write(result)
        else:
            error_list.append(name)
            print("ERROR: ", name)

But this absolutly does not work.. what did work better was this.. but it seems inefficient and does not show the error files, just the ones for the first regex:

for file in files:

  with open(file,'r', encoding='utf-8') as in_file:     
      with open(some_folder+name,'w',encoding='utf-8') as n_file:
        content = in_file.read().lower()
        matches_reg1 = re.findall(reg1, content, re.IGNORECASE | re.DOTALL | re.MULTILINE) 
        matches_reg2 = re.findall(reg2, content, re.IGNORECASE | re.DOTALL | re.MULTILINE)

                if matches_reg1:
                    result = max(matches_reg1, key=len)
                    result = str(result).replace('\n', '')
                    n_file.write(result)
                if matches_reg2:

                    result = max(matches_reg2, key=len)
                    result = str(result).replace('\n', '')
                    n_file.write(result)
                else:
                    error_list.append(name)
                    print("ERROR: ", name)

... I also considered.. but honestly can someone explain a efficient way to deal with this?

matches_reg1 = re.findall(reg1,..)
if matches_reg1:
    ...
elif matches_reg1:
    match = re.findall(reg2, ...)

Aucun commentaire:

Enregistrer un commentaire