Excluding directories in os.walk

I'm writing a script that descends into a directory tree (using os.walk()) and then visits each file matching a certain file extension. However, since some of the directory trees that my tool will be used on also contain sub directories that in turn contain a LOT of useless (for the purpose of this script) stuff, I figured I'd add an option for the user to specify a list of directories to exclude from the traversal.

This is easy enough with os.walk(). After all, it's up to me to decide whether I actually want to visit the respective files / dirs yielded by os.walk() or just skip them. The problem is that if I have, for example, a directory tree like this:

         --- dirA
         --- dirB
         --- uselessStuff --
                           --- moreJunk
                           --- yetMoreJunk

and I want to exclude uselessStuff and all its children, os.walk() will still descend into all the (potentially thousands of) sub directories of uselessStuff , which, needless to say, slows things down a lot. In an ideal world, I could tell os.walk() to not even bother yielding any more children of uselessStuff , but to my knowledge there is no way of doing that (is there?).

Does anyone have an idea? Maybe there's a third-party library that provides something like that?

Modifying dirs in-place will prune the (subsequent) files and directories visited by os.walk:

    # exclude = set([...])
    for root, dirs, files in os.walk(top, topdown=True):
        dirs[:] = [d for d in dirs if d not in exclude]

From help(os.walk):

When topdown is true, the caller can modify the dirnames list in-place (e.g., via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search...

From: stackoverflow.com/q/19859840

Back to homepage or read more recommendations: