This function walks a directory tree starting at a specified root folder, and returns a list of all of the files (and optionally folders) that match our pattern(s).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
def Walk( root, recurse=0, pattern='*', return_folders=0 ): import fnmatch, os, string # initialize result =  # must have at least root folder try: names = os.listdir(root) except os.error: return result # expand pattern pattern = pattern or '*' pat_list = string.splitfields( pattern , ';' ) # check each file for name in names: fullname = os.path.normpath(os.path.join(root, name)) # grab if it matches our pattern and entry type for pat in pat_list: if fnmatch.fnmatch(name, pat): if os.path.isfile(fullname) or (return_folders and os.path.isdir(fullname)): result.append(fullname) continue # recursively scan other folders, appending results if recurse: if os.path.isdir(fullname) and not os.path.islink(fullname): result = result + Walk( fullname, recurse, pattern, return_folders ) return result if __name__ == '__main__': # test code print '\nExample 1:' files = Walk('.', 1, '*', 1) print 'There are %s files below current location:' % len(files) for file in files: print file print '\nExample 2:' files = Walk('.', 1, '*.py;*.html') print 'There are %s files below current location:' % len(files) for file in files: print file
The standard directory tree function os.path.walk can be confusing, and is difficult to customize. It can also be slow. Here's an alternative that allows you to choose the root folder, whether to recurse down through sub-folders, the file pattern to match, and whether to include folder names in the results.
The file pattern is case insensitive and UNIX style. Multiple patterns may be specified; delimit with a semi-colon. Note that this means semi-colons themselves can't be part of a pattern. Boo-hoo.
Simplification for pat_list.
Not TOO simple :). You still need
pattern = pattern or '*'
before the split, else your code is not functionally equivalent.
updated. Thanks for the comments -- I have implemented this change.
Avoid duplicate matches. To avoid multiple matches for different patterns, replace the 'continue' statement with a 'break' statement
For example, if the pattern was set as ";.py", all Python files would be included twice. This is most likely not what you want.