mardi 2 février 2021

How to filter a row based on multiple indexes and multiple conditions?

I have a file which looks like this:

#This is TEST-data
2020-09-07T00:00:03.230+02:00,ID-10,3,London,Manchester,London,1,1,1
2020-09-07T00:00:03.230+02:00,ID-10,3,London,London,Manchester,1,1
2020-09-07T00:00:03.230+02:00,ID-20,2,London,London,1,1
2020-09-07T00:00:03.230+02:00,ID-20,2,London,London1,1
2020-09-07T00:00:03.230+02:00,ID-30,3,Madrid,Sevila,Sevilla,1,1,1
2020-09-07T00:00:03.230+02:00,ID-30,GGG,Madrid,Sevilla,Madrid,1
2020-09-07T00:00:03.230+02:00,ID-40,GGG,Madrid,Barcelona,1,1,1,1
2020-09-07T00:00:03.230+02:00
2020-09-07T00:00:03.230+02:00

Index[2] in each row shows how much cities are present in that specific row. So the first row has value 3 for index[2], which are London, Manchester, London.

I'am trying to do the following:

  1. For every row I need to check if any of row [3] + the cities mentioned after it (based on the ammounts of cities) are present in cities_to_filter. But this only needs to be done if row[2] is a number. I also need to tackle the fact that some rows contain less then 2 items.

This is my code:

path = r'c:\data\ELK\Desktop\test_data_countries.txt'

cities_to_filter = ['Sevilla', 'Manchester']

def filter_row(row):
    if row[2].isdigit():
        amount_of_cities = int(row[2]) if len(row) > 2 else True
        
    cities_to_check = row[3:3+amount_of_cities]
    
    condition_1 =  any(city in cities_to_check for city in cities_to_filter)    
    return condition_1

with open (path, 'r') as output_file:
    reader = csv.reader(output_file, delimiter = ',')
    next(reader)
    for row in reader:
        amount_of_cities = int(row[2])
        cities_to_check = row[3:3+amount_of_cities]
        print(cities_to_check)
        if filter_row(row):
            print(row)

Right now I receive the following error:

ValueError: invalid literal for int() with ba`se 10: 'GGG'`

Aucun commentaire:

Enregistrer un commentaire