vendredi 21 décembre 2018

Using zipfile to archive directory contents while skipping files from list

I'm using zipfile to create an archive of all files in a directory (recursively, while preserving directory structure including empty folders) and want the process to skip the filenames specified in a list.

This is the basic function that os.walks through a directory and adds all the containing files and directories to an archive.

def zip_dir(path):
    zipname = str(path.rsplit('/')[-1]) + '.zip'
    with zipfile.ZipFile(zipname, 'w', zipfile.ZIP_DEFLATED) as zf:
        if os.path.isdir(path):
            for root, dirs, files in os.walk(path):
                for file_or_dir in files + dirs:
                    zf.write(os.path.join(root, file_or_dir),
                            os.path.relpath(os.path.join(root, file_or_dir),
                            os.path.join(path, os.path.pardir)))
        elif os.path.isfile(filepath):
            zf.write(os.path.basename(filepath))
    zf.printdir()
    zf.close()

We can see the code should also have the ability to handle single files but it is mainly the part concerning directories that we are interested in.

Now let's say we have a list of filenames that we want to exclude from being added to the zip archive.

skiplist = ['.DS_Store', 'tempfile.tmp']

What is the best and cleanest way to achieve this?

I tried using zip which was somewhat successful but causes it to exclude empty folders for some reason (empty folders should be included). I'm not sure why this happens.

skiplist = ['.DS_Store', 'tempfile.tmp']
for root, dirs, files in os.walk(path):
    for (file_or_dir, skipname) in zip(files + dirs, skiplist):
        if skipname not in file_or_dir:
            zf.write(os.path.join(root, file_or_dir),
                    os.path.relpath(os.path.join(root, file_or_dir),
                    os.path.join(path, os.path.pardir)))

It would also be interesting to see if anyone has a clever idea for adding the ability to skip specific file extensions, perhaps something like .endswith('.png') but I'm not entirely sure of how to incorporate it together with the existing skiplist.

I would also appreciate any other general comments regarding the function and if it indeed works as expected without surprises, as well as any suggestions for optimizations or improvements.

Aucun commentaire:

Enregistrer un commentaire