Welcome, guest | Sign In | My Account | Store | Cart

A generator that does what os.path.walk does, without the need for a callback function.

Python, 55 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from __future__ import generators # needed for Python 2.2
import os
import stat
import types

def walktree(top=".", depthfirst=False):
    """
    Walk a directory tree, starting from 'top'

    This code is based on os.path.walk, with the callback function
    replaced by a yield and recursion replaced by iteration
    """
    if type(top) != types.StringType:
        raise TypeError("top must be a string")

    # decide which end of the stack to pop
    if depthfirst:
        index = -1
    else:
        index = 0
    
    dirstack = [top]
    while dirstack:
        top = dirstack.pop(index)
        try:
            names = os.listdir(top)
        except os.error:
             return
        yield top, names
        dirs = []
        for name in names:
            name = os.path.join(top, name)
            try:
                st = os.lstat(name)
            except os.error:
                continue
            if stat.S_ISDIR(st.st_mode):
                dirs.append(name)
        # the depth-first results look 'backwards' to me
        # so I'm changing them to fit my expectations
        if depthfirst:
            dirs.reverse()
        dirstack.extend(dirs)

if __name__ == "__main__":
    """
    a sample:
    look for python scripts starting in the current directory
    """
    
    for top, names in walktree():
        print "searching folder %s" % (top,)
        for name in names:
            if name[-3:] == ".py":
                print "I think %s is a python script" % (name,)

This is a rewrite of os.path.walk. It's convenient for processing files that inhabit directory trees. Its original purpose was to vist the '.h' files of a large C++ project and wrap them in#ifdef include guards.

Thanks to Tim Peters for pointing out that using an internal stack instead of recursion gives one the choice of depth-first or width-first visits.

Thanks to Raymond Hettinger for pointing out st.st_mode and urging me to get rid of the global search mode definitions.

Thanks to Noah Spurrier for suggesting a sample.

4 comments

Noah Spurrier 21 years, 1 month ago  # | flag

Neat, but give an example! This looks like a neat use of generators, but it needs an example of its use. Maybe give an example of converting all filenames to lowercase or something.

Noah Spurrier 21 years ago  # | flag

Depth First is broken. I just noticed that Depthfirst is broken. It still walks the tree breadth first. A fix is not easy as it would involve recursion or an additional stack. I'm not sure how recurssion works with generators.

For example, take the following directory tree:

    AAA
   /   \
BBB     CCC
       /   \
    DDD     EEE

This structure can be created using these commands:

mkdir AAA
mkdir AAA/BBB
mkdir AAA/CCC
mkdir AAA/CCC/DDD
mkdir AAA/CCC/EEE

Then run this python code to walk the tree:

root = 'AAA'
w = walktree (root, True)
for (basepath, children) in w:
    print basepath, ':', children

The expected output for depth first tree traversal is this:

AAA/BBB : []
AAA/CCC/DDD : []
AAA/CCC/EEE : []
AAA/CCC : ['DDD', 'EEE']
AAA : ['BBB', 'CCC']

But it does not produce this. The actual output is below. This is actually still BREADTH first, but following the RIGHTMOST path first.

AAA : ['BBB', 'CCC']
AAA/CCC : ['DDD', 'EEE']
AAA/CCC/EEE : []
AAA/CCC/DDD : []
AAA/BBB : []

With depthfirst set to False you still get BREADTH first, but following the LEFTMOST path first:

AAA : ['BBB', 'CCC']
AAA/BBB : []
AAA/CCC : ['DDD', 'EEE']
AAA/CCC/DDD : []
AAA/CCC/EEE : []

Yours, Noah

Noah Spurrier 21 years ago  # | flag

Fix for depth first problem. The depth first problem is important. Consider how would you easily write a directory walker to rename all files AND directories to lower case? The obvious code does not do what one would expect:

from walktree import *
for (basepath, children) in walktree():
    for child in children:
        old_name = os.path.join(basepath, child)
        new_name = os.path.join(basepath, child.lower())
            print old_name, "-->", new_name
        os.rename (old_name, new_name)

That will try to rename the directories in the wrong order. It will try to rename a directory before it has visited the children of that directory. Given the following directory tree:

    AAA
   /   \
BBB     CCC
       /   \
    DDD     EEE

That code will try to do this:

os.rename ('./AAA/CCC', './AAA/ccc')

then later:

os.rename ('./AAA/CCC/DDD', './AAA/CCC/ddd') # ERROR!

But AAA/CCC does not exist anymore. It was previously renamed to AAA/ccc!

The easiest way to fix walktree() is to use recursion. It's interesting to note that generators may be nested recursively. The following code fixes walktree to make it work properly for depth first tree walking.

Yours,

Noah Spurrier

from __future__ import generators # needed for Python 2.2
import os
import stat

def walktree (top = ".", depthfirst = True):
    """This walks a directory tree, starting from 'top'.
    This is somewhat like os.path.walk, but using
    generators instead of a visit function.
    One important difference is that walktree() defaults
    to DEPTH first with optional BREADTH first,
    whereas the os.path.walk function allows only BREADTH first.
    Depth first was made the default because it is safer if
    you are going to be modifying directory names. This avoids
    the problem of renaming a directory before visiting the children
    of that directory.

    """
    names = os.listdir(top)
    if not depthfirst:
        yield top, names
    for name in names:
        try:
            st = os.lstat(os.path.join(top, name))
        except os.error:
            continue
        if stat.S_ISDIR(st.st_mode):
            for (newtop, children) in walktree (os.path.join(top, name), depthfirst):
                yield newtop, children
    if depthfirst:
        yield top, names

def test():
    for (basepath, children) in walktree():
        for child in children:
        print os.path.join(basepath, child)

if __name__ == '__main__':
    test()
Andrew Dalke 20 years, 8 months ago  # | flag

os.walk. Also, see 'os.walk', added in Python 2.3