This is a function to iterate over a container and its elements that checks for recursive traps. The condition for descending into elements is highly configurable (a list of type() results, or a callable). Spin-offs: a function to check for iterability in a loose and in a strict sense; a function to flatten a hierarchical container into a list of its elements.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | def iterable( x, strict = 0 ):
"""Return true if +x+ allows application of the +in+ operator (which excludes, for example,
numbers and +None+), *and* the +in+ loop is actually entered into (excluding sequences of
length zero). If +strict+ is true, an affirmative answer is furthermore subject to the
condition that the first element retrieved within an +in+ loop be not equal to the thing
being iterated over, which excludes strings of length one. -- Used by +walk()+ and
+flatten()+."""
try:
for element in x:
if strict:
return element != x
return 1
except TypeError:
pass
return 0
class RecursiveContent:
"""Wrapper class for content classed as displaying recursive behavior. The original content
is available as ~+.data+. Given content +foo+ that prints a symbolic representation like,
say, +^blah$+, the representation +`RecursiveContent(foo)+ is +^...$+, ie, the surround
characters are kept, while the inner part is replaced by a typographical ellipsis. This is
somewhat compatible with the way Python symbolizes a recursive list. -- Used by +walk()+
and +flatten()+."""
def __init__( self, data ): self.data = data
def __call__( self ): return self.data
def __repr__( self ): return '%s...%s' % ( `self.data`[0], `self.data`[-1] )
def __iter__( self ): return iter( self.data )
def walk( seq, descend = ( list, tuple ), ancestry = None ):
"""This function provides a convenient and recursion-proof way to iterate over arbitrary
data (which is a surprisingly involved task). +seq+ may be anything accepting the
application of the +in+ operator; +descend+ should either be a sequence of types that will
be descended into, or a function that decides, when called with a sequence element as
argument, whether to descend into that element or not. +ancestry+ is an argument used by
+walk()+ for internal bookkeeping (but you can manipulate it for special effects). -- Note
that +descend+ is only checked when trying to decide whether to descend into *elements* of
the sequence given, not when iterating over the sequence itself. Hence, if you pass a tuple
but specify that only lists should be descended into, the outermost sequence will still be
analyzed into elements. For example::
seq = ( 0, 1, 2, [ 3, 4 ], 5, ( 6, 7 ), 8 )
for e in walk( seq, ( list, ) ):
print e,
will print ::
0 1 2 3 4 5 (6, 7) 8
onto the screen. It is possible to pass +str+ as an element in +descend+, which will split
strings into characters, so the results of ::
walk( ( 0, 'abc', 1 ), ( str, ) )
and
walk( ( 0, 'a', 'b', 'c', 1 ) )
become indistinguishable.
Note that data whose type appears in +descend+ but which are inherently non-iterable will
simply be absent from the result. This is logical since iterables whose types are listed in
+descend+ but happen to be empty likewise do not leave any trace in the result (since they
don't offer elements to be looped over).
As convenient spin-offs, there are two more functions related to walk: +iterable()+ and
+flatten()+, q.v.
IMPLEMENTATION
==============
The fundamental function is as simple as this::
def walk( seq, descend ):
for element in seq:
if descend( element ):
for subelement in walk( element, descend ):
yield subelement
else:
yield element
However, this definition is not safe if an iterable contains itself as an element. In order
to prevent the function from descending infinitely, we need to keep an account of what
elements we have already come across in the same branch; argument +ancestry+ does exactly
that (it is o.k. to put copies of he same container side by side into another container,
since that does not lead to recursion; we only need check for cases where a container
contains itself). If a recursive element is detected, an instance of +RecursiveContent+ is
returned (that the caller might want to check for). +RecursiveContent+s print out much the
same way Python's standard representation of recursive lists does.
Another difficulty comes with strings: when a string is being iterated over, it yields
characters. Now, a character is simply a string of length one, so once we allow descending
into strings on the ground of their type, we also allow to descend into characters, which
would result in recursion. To prevent this, +walk()+ additionally checks for 'strict
iterability', as defined by function +iterable()+ when called with +strict=1+.
THANKSTO
========
This code was inspired by the posting "Walk a directory tree using a generator" by Tom Good,
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/105873.
"""
# If descend is not already a function, construct one that
# assumes that descend has been given as a list of types:
_descend = descend
if not callable( _descend ):
_descend = lambda x: type( x ) in descend
# Initialize ancestry:
if ancestry is None: ancestry = []
# Register sequence as ancestor:
ancestry.append( seq )
# Unconditionally loop over all elements in sequence:
for element in seq:
# If element is in ancestry:
if element in ancestry:
yield RecursiveContent( element )
# If element is not in ancestry:
else:
# If element is eligible for descend, see whether to actually descend
# into it, yield it wholly, or don't yield anything:
if _descend( element ):
# If element is eligible and strictly iterable (i.e., not inherently
# recursive and not an empty sequence), descend:
if iterable( element, strict = 1 ):
for subelement in walk( element, descend, ancestry ):
yield subelement
# If element is eligible and not strictly, but at least loosely iterable,
# yield it (because then it is something like a single-character string
# which should not be lost):
elif iterable( element ):
yield element
# If element is not eligible for descend, yield it:
else:
yield element
ancestry.pop()
def flatten( seq, descend = ( list, tuple ) ):
"""Return a list of all the elements that result from an exhaustive +walk()+ over a given
+seq+uence. See +walk()+ for the interpretation of +descend+."""
return [ element for element in walk( seq, descend ) ]
|
Tags: algorithms
Backquotes in __repr__. In the class RecursiveContent, there are backquotes around the data references:
What is the purpose of the backquotes?