This is my structure:
% tree /tmp/test
/tmp/test
├── dir_1
│ ├── XX_20201201.txt
│ └── XX_20201202.txt
├── dir_2
│ ├── YY_20201201.txt
│ └── YY_20201202.txt
└── dir_3
├── ZZ_20201201.txt
├── ZZ_20201202.txt
└── ZZ_20201203.txt
3 directories, 7 files
With below code I tend to filter and process 3 files into 1 based on their date. I also check if that date is present in a list missing_dates. Now based on that I expected 2 files as a result, 1 file from 20201201 and 1 file from 20201202, because those dates are present in missing_list and they are present in every directory.
My code:
missing_dates = ['20201201', '20201202', '20201203']
root=Path(r'c:\data\FF\Desktop\new_location\counterpart')
for d in missing_dates:
print(f"processing {d}")
files=[fn for fn in (e for e in root.glob(f"**/*_{d}.txt") if e.is_file())]
if len(files)==3: #<-- check if you have a total of 3 files of the same date.
for file in files:
name_file = ntpath.basename(file)
date_file = re.search('_(.\d+).', name_file).group(1) #<-- get the date of the file
with open(file, 'r') as my_file: #<-- open the files, read them and process them.
reader = csv.reader(my_file, delimiter = ',')
next(reader)
for row in reader:
if filter_row(row):
vehicle_loc_dict[(row[9], location_token(row))].append(row)
with open(my_files + '\\' + 'File_X' + '\\' + 'Vehicle_' + date_file + '.txt', 'w') as output:
writer = csv.writer(output, delimiter = '\t')
for vehicle_loc_list in vehicle_loc_dict.values():
for record_group in group_records(vehicle_loc_list):
writer.writerow(output_record(record_group))
Now if I open the path my_files\File_X I find just 1 file named Vehicle_20201202.txt. I think it merged the 6 files into 1 big files instead of merging 6 files into 2 based on their date..
I really struggle to fix this and I hope someone can help me. Please note that my code also contains a lot of other functions but they are not relevant for this case.
Aucun commentaire:
Enregistrer un commentaire