Excluding directories in os.walk
I'm writing a script that descends into a directory tree (using os.walk()) and then visits each file matching a certain file extension. However, since some of the directory trees that my tool will be used on also contain sub directories that in turn contain a LOT of useless (for the purpose of this script) stuff, I figured I'd add an option for the user to specify a list of directories to exclude from the traversal.
This is easy enough with os.walk(). After all, it's up to me to decide whether I actually want to visit the respective files / dirs yielded by os.walk() or just skip them. The problem is that if I have, for example, a directory tree like this:
root-- | --- dirA | --- dirB | --- uselessStuff -- | --- moreJunk | --- yetMoreJunk
and I want to exclude uselessStuff and all its children, os.walk() will still descend into all the (potentially thousands of) sub directories of uselessStuff , which, needless to say, slows things down a lot. In an ideal world, I could tell os.walk() to not even bother yielding any more children of uselessStuff , but to my knowledge there is no way of doing that (is there?).
Does anyone have an idea? Maybe there's a third-party library that provides something like that?
dirs in-place will prune the (subsequent) files and directories visited by
# exclude = set([...]) for root, dirs, files in os.walk(top, topdown=True): dirs[:] = [d for d in dirs if d not in exclude]
When topdown is true, the caller can modify the dirnames list in-place (e.g., via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search...