I have a file which looks like this:
#This is TEST-data
2020-09-07T00:00:03.230+02:00,ID-10,3,London,Manchester,London,1,1,1
2020-09-07T00:00:03.230+02:00,ID-10,3,London,London,Manchester,1,1
2020-09-07T00:00:03.230+02:00,ID-20,2,London,London,1,1
2020-09-07T00:00:03.230+02:00,ID-20,2,London,London1,1
2020-09-07T00:00:03.230+02:00,ID-30,3,Madrid,Sevila,Sevilla,1,1,1
2020-09-07T00:00:03.230+02:00,ID-30,GGG,Madrid,Sevilla,Madrid,1
2020-09-07T00:00:03.230+02:00,ID-40,GGG,Madrid,Barcelona,1,1,1,1
2020-09-07T00:00:03.230+02:00
2020-09-07T00:00:03.230+02:00
Index[2] in each row shows how much cities are present in that specific row. So the first row has value 3 for index[2], which are London, Manchester, London.
I'am trying to do the following:
- For every row I need to check if any of
row [3]+ the cities mentioned after it (based on the ammounts of cities) are present incities_to_filter.But this only needs to be done if row[2] is a number. I also need to tackle the fact that some rows contain less then 2 items.
This is my code:
path = r'c:\data\ELK\Desktop\test_data_countries.txt'
cities_to_filter = ['Sevilla', 'Manchester']
def filter_row(row):
if row[2].isdigit():
amount_of_cities = int(row[2]) if len(row) > 2 else True
cities_to_check = row[3:3+amount_of_cities]
condition_1 = any(city in cities_to_check for city in cities_to_filter)
return condition_1
with open (path, 'r') as output_file:
reader = csv.reader(output_file, delimiter = ',')
next(reader)
for row in reader:
amount_of_cities = int(row[2])
cities_to_check = row[3:3+amount_of_cities]
print(cities_to_check)
if filter_row(row):
print(row)
Right now I receive the following error:
ValueError: invalid literal for int() with ba`se 10: 'GGG'`
Aucun commentaire:
Enregistrer un commentaire