mardi 8 décembre 2020

Group and process files based on their date?

This is my structure:

% tree /tmp/test
/tmp/test
├── dir_1
│   ├── XX_20201201.txt
│   └── XX_20201202.txt
├── dir_2
│   ├── YY_20201201.txt
│   └── YY_20201202.txt
└── dir_3
    ├── ZZ_20201201.txt
    ├── ZZ_20201202.txt
    └── ZZ_20201203.txt

3 directories, 7 files

With below code I tend to filter and process 3 files into 1 based on their date. I also check if that date is present in a list missing_dates. Now based on that I expected 2 files as a result, 1 file from 20201201 and 1 file from 20201202, because those dates are present in missing_list and they are present in every directory.

My code:

missing_dates = ['20201201', '20201202', '20201203']

root=Path(r'c:\data\FF\Desktop\new_location\counterpart')
for d in missing_dates:
    print(f"processing {d}")
    files=[fn for fn in (e for e in root.glob(f"**/*_{d}.txt") if e.is_file())]
    if len(files)==3:   #<-- check if you have a total of 3 files of the same date.
        for file in files:  
            name_file = ntpath.basename(file)            
            date_file = re.search('_(.\d+).', name_file).group(1) #<-- get the date of the file
            with open(file, 'r') as my_file:  #<-- open the files, read them and process them.
                reader = csv.reader(my_file, delimiter = ',')
                next(reader)
                for row in reader:
                    if filter_row(row):                      
                        vehicle_loc_dict[(row[9], location_token(row))].append(row)
                                                    
                                                
with open(my_files + '\\' + 'File_X' + '\\' + 'Vehicle_' +  date_file + '.txt', 'w') as output:
    writer = csv.writer(output, delimiter = '\t')
    for vehicle_loc_list in vehicle_loc_dict.values():
        for record_group in group_records(vehicle_loc_list):
            writer.writerow(output_record(record_group))

Now if I open the path my_files\File_X I find just 1 file named Vehicle_20201202.txt. I think it merged the 6 files into 1 big files instead of merging 6 files into 2 based on their date..

I really struggle to fix this and I hope someone can help me. Please note that my code also contains a lot of other functions but they are not relevant for this case.

Aucun commentaire:

Enregistrer un commentaire