jeudi 7 janvier 2021

Create a function that reads a csv file line by line and loads those lines that meet two different regex conditions

I want to create a function that reads a csv file line by line and loads those lines that meet two different regex conditions. The first condition is loading those lines that include any roman number: IVXLCDM After that condition is met, I need to filter out the ones that include the following pattern: .od.s

So if I have a csv file like this:

547 I. Line 1 
479 II. Todos Line 2
897 Line 3
879 XI. Line 4

It should only load these lines:

547 I. Line 1 
879 XI. Line

So far I have this:

def load_file(file_extension):
    import re
    file = open(file_extension,'r')
    filter1 = re.compile("\d{3}\s+.([.IVXLCDM.]+)")
    filter2 = re.compile(".od.s")
    final_list = []
    for line in file:
        if re.search(filter1,line):
           if not re.search(filter2,line):
              final_list.append(line)
        return(final_list)
    file.close()
   

print(load_file('file.csv'))

But it keeps returning an empty list.

I am not sure if this can be done in a single function. I also tried creating two different functions: One that filters a list with both regex conditions, and another one that calls the first function when it reads a csv file. But it also didn't work.

Aucun commentaire:

Enregistrer un commentaire