ActiveState Code

Recipe 52664: Flexible directory walking


This function walks a directory tree starting at a specified root folder, and returns a list of all of the files (and optionally folders) that match our pattern(s).

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def Walk( root, recurse=0, pattern='*', return_folders=0 ):
	import fnmatch, os, string
	
	# initialize
	result = []

	# must have at least root folder
	try:
		names = os.listdir(root)
	except os.error:
		return result

	# expand pattern
	pattern = pattern or '*'
	pat_list = string.splitfields( pattern , ';' )
	
	# check each file
	for name in names:
		fullname = os.path.normpath(os.path.join(root, name))

		# grab if it matches our pattern and entry type
		for pat in pat_list:
			if fnmatch.fnmatch(name, pat):
				if os.path.isfile(fullname) or (return_folders and os.path.isdir(fullname)):
					result.append(fullname)
				continue
				
		# recursively scan other folders, appending results
		if recurse:
			if os.path.isdir(fullname) and not os.path.islink(fullname):
				result = result + Walk( fullname, recurse, pattern, return_folders )
			
	return result

if __name__ == '__main__':
	# test code
	print '\nExample 1:'
	files = Walk('.', 1, '*', 1)
	print 'There are %s files below current location:' % len(files)
	for file in files:
		print file

	print '\nExample 2:'
	files = Walk('.', 1, '*.py;*.html')
	print 'There are %s files below current location:' % len(files)
	for file in files:
		print file

Discussion

The standard directory tree function os.path.walk can be confusing, and is difficult to customize. It can also be slow. Here's an alternative that allows you to choose the root folder, whether to recurse down through sub-folders, the file pattern to match, and whether to include folder names in the results.

The file pattern is case insensitive and UNIX style. Multiple patterns may be specified; delimit with a semi-colon. Note that this means semi-colons themselves can't be part of a pattern. Boo-hoo.

Comments

  1. 1. At 12:27 p.m. on 21 apr 2001, Saveen Reddy said:

    Simplification for pat_list.

    Hi, minor comment ...
    
    The logic used for creating pat_list can be simplified ...
    
    FROM THIS:
            if not pattern:
                    pat_list = ['*']
            elif ';' in pattern:
                    pat_list = string.split(pattern, ';')
            else:
                    pat_list = [pattern]
    
    TO THIS:
            pat_list = string.splitfields( pattern , ';' )
    
    
    Thanks,
    -Saveen
    
  2. 2. At 1:58 p.m. on 10 may 2001, Jürgen Hermann said:

    Not TOO simple :). You still need

    pattern = pattern or '*'

    before the split, else your code is not functionally equivalent.

  3. 3. At 4:28 p.m. on 25 jun 2001, Robin Parmar (the author) said:

    updated. Thanks for the comments -- I have implemented this change.

  4. 4. At 12:01 a.m. on 21 jun 2002, Marcel van der Laan said:

    Avoid duplicate matches. To avoid multiple matches for different patterns, replace the 'continue' statement with a 'break' statement

    For example, if the pattern was set as ";.py", all Python files would be included twice. This is most likely not what you want.

Sign in to comment