Welcome, guest | Sign In | My Account | Store | Cart

This function walks a directory tree starting at a specified root folder, and returns a list of all of the files (and optionally folders) that match our pattern(s).

Python, 47 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def Walk( root, recurse=0, pattern='*', return_folders=0 ):
	import fnmatch, os, string
	
	# initialize
	result = []

	# must have at least root folder
	try:
		names = os.listdir(root)
	except os.error:
		return result

	# expand pattern
	pattern = pattern or '*'
	pat_list = string.splitfields( pattern , ';' )
	
	# check each file
	for name in names:
		fullname = os.path.normpath(os.path.join(root, name))

		# grab if it matches our pattern and entry type
		for pat in pat_list:
			if fnmatch.fnmatch(name, pat):
				if os.path.isfile(fullname) or (return_folders and os.path.isdir(fullname)):
					result.append(fullname)
				continue
				
		# recursively scan other folders, appending results
		if recurse:
			if os.path.isdir(fullname) and not os.path.islink(fullname):
				result = result + Walk( fullname, recurse, pattern, return_folders )
			
	return result

if __name__ == '__main__':
	# test code
	print '\nExample 1:'
	files = Walk('.', 1, '*', 1)
	print 'There are %s files below current location:' % len(files)
	for file in files:
		print file

	print '\nExample 2:'
	files = Walk('.', 1, '*.py;*.html')
	print 'There are %s files below current location:' % len(files)
	for file in files:
		print file

The standard directory tree function os.path.walk can be confusing, and is difficult to customize. It can also be slow. Here's an alternative that allows you to choose the root folder, whether to recurse down through sub-folders, the file pattern to match, and whether to include folder names in the results.

The file pattern is case insensitive and UNIX style. Multiple patterns may be specified; delimit with a semi-colon. Note that this means semi-colons themselves can't be part of a pattern. Boo-hoo.

4 comments

Saveen Reddy 20 years, 7 months ago  # | flag

Simplification for pat_list.

Hi, minor comment ...

The logic used for creating pat_list can be simplified ...

FROM THIS:
        if not pattern:
                pat_list = ['*']
        elif ';' in pattern:
                pat_list = string.split(pattern, ';')
        else:
                pat_list = [pattern]

TO THIS:
        pat_list = string.splitfields( pattern , ';' )


Thanks,
-Saveen
J├╝rgen Hermann 20 years, 6 months ago  # | flag

Not TOO simple :). You still need

pattern = pattern or '*'

before the split, else your code is not functionally equivalent.

Robin Parmar (author) 20 years, 5 months ago  # | flag

updated. Thanks for the comments -- I have implemented this change.

Marcel van der Laan 19 years, 5 months ago  # | flag

Avoid duplicate matches. To avoid multiple matches for different patterns, replace the 'continue' statement with a 'break' statement

For example, if the pattern was set as ";.py", all Python files would be included twice. This is most likely not what you want.