A variation on Recipe 578117 that can deal with cycles. A cycle means that a tree has itself as a subtree somewhere. A fold over such a data structure has a chicken-and-egg-problem: it needs its own result in order to construct its own result. To solve this problem, we let branch
construct a part of its result before going into recursion. After the recursion, branch
gets a chance to complete its result using its children's results. Python's support for coroutines (using yield
) provides a nice way to define such a two-stage branch
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | def identity(x):
return x
def cyclist(tree):
res = []
reschildren = yield res
res.extend(reschildren)
yield
def foldcyc(tree,branch=cyclist,leaf=identity,shared=identity,getchildren=iter):
mem = dict()
def _fold(tree):
if id(tree) in mem:
return shared(mem[id(tree)])
else:
try:
children = getchildren(tree)
except:
res = leaf(tree)
mem[id(tree)] = res
return res
coroutine = branch(tree)
res = coroutine.next()
mem[id(tree)] = res
reschildren = [_fold(child) for child in children]
coroutine.send(reschildren)
return res
return _fold(tree)
|
Without additional arguments, foldcyc(tree)
defaults to reconstructing the structure using lists, while sharing the leaves.
The two-stage approach can be seen in cyclist
(no pun intended): before the recursive call into the children of tree
, it first constructs an empty list as its own result, for its children to link to. This is passed back to foldsh
using yield
. After foldsh
has finished the recursion, cyclist
is re-entered and receives the results, which it uses to complete its own result.
Example
>>> tree = [42,[]]
>>> tree.append(tree)
>>> foldcyc(tree, leaf= lambda x: x+1)
[43, [], <Recursion on list with id=147258796>]
>>> id(_)
147258796
Converting between implicit and explicit representation
We can also represent the cyclic list structure using explicit pointers. A list then contains these references instead of sublists. The administration that maps pointers to lists is kept in a dictionary. The following two functions convert between the two representations:
def egraph(igraph):
"""
Convert an implicit list graph to an explicit graph: a dict mapping
node ids to a list of children node ids.
"""
eg = {}
ids = itertools.count()
def branch(tree):
resid = ids.next()
reschildren = yield resid
eg[resid] = reschildren
yield
foldcyc(igraph,branch=branch)
return eg
def igraph(egraph,root):
"""
Convert a dict, mapping node ids to a list of children node ids,
into an implicit cyclic structure of lists.
"""
def getchildren(key):
return iter(egraph[key])
return foldcyc(root,getchildren=getchildren)
Usage (using the tree
defined above):
>>> egraph(tree)
{0: [42, 1, 0], 1: []}
>>> igraph(_,0)
[42, [], <Recursion on list with id=156110572>]
YAML example
PyYAML can load Python objects from YAML files using !tags
, but this doesn't use the class's __init__
function. We define an alternative YAML representation of a Python object: a mapping with class
and args
keys. We use foldcyc
to create (possibly cyclic) instances according to this representation.
import yaml # from http://pyyaml.org/
def initcyc(tree):
def branch(tree):
obj = object.__new__(globals()[tree['class']])
reschildren = yield obj
obj.__init__(*reschildren)
yield
def getchildren(tree):
return tree['args']
return foldcyc(tree,branch,getchildren=getchildren)
class A(object):
def __init__(self,spam,eggs):
self.spam = spam
self.eggs = eggs
class B(object):
pass
y = yaml.load('''
&a
class: A
args:
- class: B
args: []
- *a
''')
Let's check:
>>> a = initcyc(y)
>>> a
<tmp.A object at 0xa3baa2c>
>>> a.spam
<tmp.B object at 0xa3ba9ac>
>>> a.eggs
<tmp.A object at 0xa3baa2c>