Walker encapsulates os.walk's directory traversal as an object with the added features of excluded directories and a hook for calling an outside function to act on each file.
Walker can easily be subclassed for more functionality, as with ReWalker which filters filenames in traversal by a regular expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
""" Walker encapsulates os.walk's directory traversal as an object with the added features of excluded directories and a hook for calling an outside function to act on each file. Walker can easily be subclassed for more functionality. ReWalker filters filenames in traversal by a regular expression. Jack Trainor 2007 """ import os, os.path import re class Walker(object): def __init__(self, dir, executeHook=None, excludeDirs=): self.dir = dir self.executeHook = executeHook self.excludeDirs = excludeDirs def isValidFile(self, fileName): return True def isValidDir(self, dir): head, tail = os.path.split(dir) valid = (not tail in self.excludeDirs) return valid def executeFile(self, path): if self.executeHook: self.executeHook(self, path) # else subclass Walker and override executeFile def execute(self): for root, dirs, fileNames in os.walk(self.dir): for fileName in fileNames: if self.isValidDir(root) and self.isValidFile(fileName): path = os.path.join(root, fileName) self.executeFile(path) return self class ReWalker(Walker): def __init__(self, dir, fileMatchRe, executeHook=None, excludeDirs=): Walker.__init__(self, dir, executeHook, excludeDirs) self.fileMatchPat = re.compile(fileMatchRe) def isValidFile(self, fileName): return self.fileMatchPat.match(fileName) ####################################################### """ For testing: """ def RenameFile(path, matchRe, subRe): dir, name = os.path.split(path) newName = re.sub(matchRe, subRe, name) if newName != name: print "%s -> %s" % (name, newName) newPath = os.path.join(dir, newName) os.rename(path, newPath) def Rename1(walker, path): RenameFile(path, r"(.*)\.pyc$", r"#\1.pyc#") def Rename2(walker, path): RenameFile(path, r"#(.*)\.pyc#$", r"\1.pyc") def Test(): """ renames pyc files to #.*pyc# then restores them back again """ walker = ReWalker(r"C:\Dev\Copy of PyUtils", r".*\.pyc$", Rename1, [".svn"]).execute() walker = ReWalker(r"C:\Dev\Copy of PyUtils", r".*\.pyc#$", Rename2, [".svn"]).execute() if __name__ == "__main__": Test()
At one point I found myself having to do frequent houeskeeping/utility chores on a large body of source code which included Subversion directories that I obviously didn't want to touch. Over time I accumulated and refined code to make that easy and straightforward. There are many ways to do this, of course. This is my latest version.
I also prefer using regular expressions for file names instead of glob.
The motivation here is DRY (Don't Repeat Yourself). os.walk makes it easy to traverse directories, but I didn't want to keep cut and pasting that block of code. I wanted to write that function which did what I wanted to one file and just plug the function into a larger call.
Exclude dirs won't work as intended. I'm fairly new to Python so I'm not 100% sure on this, but it looks like directory exclusion won't work as intended (or at least as I assume it's intended).
If I am excluding ".svn" directories, this will prevent executeFile() being called on the files in the .svn/ directory itself, but not in directories under .svn (which contain, for example, a copy of the current checked out version of the repository)
E.g. the following directory structure for hello.py which is stored in a subversion repository:
If I was matching the regular expression "^hello." using a ReWalker and excluding ".svn" dirs:
This would match:
Presumably to make it work as I assume it was intended, the isValidDir function needs to check more than just the last component of the path (the tail from os.path.split), and instead iterate through each directory (probably ignoring the self.dir prefix in the dir being tested by isValidDir)...
Though I'm sure there's a better way (more elegant in Python at the very least) to code this...
Nice work Jack, but I have to agree, the excluded directory feature doesn't work. Andrew, your solution could work, but there's actually simpler:
The os.walk() function returns a tuple of 3 parameters at each call:
the root dir, a list of sub-directories, and a list of files. As the API doc says (http://docs.python.org/lib/os-file-dir.html), it is possible to modify in place the list of sub-directories to limit where os.walk() will go down next. So, in the Walker class, first we can get rid of the isValidDir() method, and here's the new execute() method:
With reference to the new execute method by Guillaume Rava, the modified code left out the check for a valid file - as when using a regular expression to filter the files.
The revised code should now be:
Hm, it will be better to use os.listdir:
...and isValidDir should be changed if you choose such way:
How to use your functions to walk a directory and ignore all the files or directories which names begin in '.' (e.g. '.svn')? I added the following code but it has bugs. Please help. Thanks.
[code] """ For testing: """ def ProcessFile(walker, path): print("walker " + walker + " path " + path)
if __name__ == "__main__": walker = ReWalker(r"C:\test\com.comp.hw.prod.proj.war\bin", r".*", ProcessFile, ["."]).execute()
C:\python>ReWalker.py Traceback (most recent call last): File "C:\python\ReWalker.py", line 80, in <module> walker = ReWalker(r"C:\test\com.comp.hw.prod.proj.war\bin", r".*", ProcessFile, ["."]).execute() AttributeError: 'ReWalker' object has no attribute 'execute' [/code]