Welcome, guest | Sign In | My Account | Store | Cart

The function `foldsh` in this recipe is a general purpose tool for transforming tree-like recursive data structures while keeping track of shared subtrees.

``````# By default, a branch is encoded as a list of subtrees; each subtree can be a
# branch or a leaf (=anything non-iterable). Subtrees can be shared:
>>> subtree = [42,44]
>>> tree = [subtree,[subtree]]
# We can apply a function to all leaves:
>>> foldsh(tree, leaf= lambda x: x+1)
[[43, 45], [[43, 45]]]
# Or apply a function to the branches:
>>> foldsh(tree, branch= lambda t,c: list(reversed(c)))
[[[44, 42]], [44, 42]]
# The sharing is preserved:
>>> _ is _
True
# Summing up the leaves without double counting of shared subtrees:
>>> foldsh(tree, branch= lambda t,c: sum(c), shared= lambda x: 0)
86
``````

In particular, it is useful for transforming YAML documents. An example of this is given below.

Python, 28 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28``` ```def identity(x): return x def foldsh(tree, branch= lambda tree,ch: list(ch), leaf=identity, shared=identity, getchildren=iter): '''Sharing-aware tree transformations. foldsh(tree,branch,leaf,shared,getchildren) :: tree -> result branch :: tree, list of children's results -> result leaf :: tree -> result shared :: result -> result getchildren :: tree -> iterable of children''' mem = dict() def _fold(tree): if id(tree) in mem: return shared(mem[id(tree)]) else: try: children = getchildren(tree) except: res = leaf(tree) else: reschildren = [_fold(child) for child in children] res = branch(tree,reschildren) mem[id(tree)] = res return res return _fold(tree) ```
##### Changelog

Revision 2: rewrite using the try...except...else idiom which I didn't know before and makes the logical structure clearer. Functionality is unchanged.

##### Background

Many transformations on trees follow the same general pattern. In a recursive traversal,

• first, leaves are transformed into results (here, this is the task of the function `leaf`);
• when all subtrees (children) of a branch have been transformed, the results of these children are combined into a result for the branch (the task of function `branch`);
• this way, results are passed up the tree until the result for the whole tree is constructed.

In this recipe, the recursion is performed by `foldsh`; the transformation functions `leaf` and `branch` are supplied by the user. Note that a "result" can be anything, for example

• an integer counting the number of leaves, depth of the tree, maximal breadth of branches;
• a list of all the leaves in the tree;
• another tree.

The general pattern is known as a fold (and in Python as `reduce`; but both these terms are mostly used for inductive lists) or catamorphism.

In addition to implementing a regular fold, this recipe supports sharing of subtrees. (So the data structure is actually a DAG, but we'll still say tree.) If during the recursive traversal `foldsh` encounters a subtree that it has seen before, it will not recurse into it, but instead take the previously constructed result, and apply `shared` to it.

To make `foldsh` recurse over other tree-like structures, supply it with a different `getchildren` function. This function will be applied to the tree (and subtrees) to determine whether we have reached a branch or a leaf. If `getchildren` is applied to a branch, it should return an iterable of subtrees; if applied to a leaf, it should throw an exception.

With no arguments except its first, `foldsh(tree)` returns a copy of `tree` where the structure consists of newly constructed lists, and the leaves are shared with `tree`. In the new structure, the sharing of subtrees is preserved.

##### Examples
``````def countsh(tree):
'''Count the number of leaf+branch nodes (count shared subtrees once).'''
def branch(tree,reschildren):
return 1 + sum(reschildren)
def leaf(tree):
return 1
def shared(tree):
return 0
return foldsh(tree,branch,leaf,shared)

def listleaves(tree):
'''List all leaves (but those from shared subtrees only once).'''
def leaf(tree):
return [tree]
def branch(tree,reschildren):
result = []
for rc in reschildren:
result.extend(rc)
return result
def shared(res):
return []
return foldsh(tree,branch,leaf,shared)

# Actually, this can also be elegantly solved by returning iterators as
# result. We do not have to make an exception for shared subtrees, because
# these (shared!) iterators will already be exhausted when encountered for
# the second time:

def iterleavesonce(tree):
'''Iterate all leaves (but those from shared subtrees only once).'''
def leaf(tree):
return iter([tree])
def branch(tree,reschildren):
return itertools.chain(*reschildren)
return foldsh(tree,branch,leaf)

# Build a tree out of frozensets instead of lists:
subtree = [42,44]
tree = [subtree,[subtree]]
fstree = foldsh(tree, branch= lambda t,c: frozenset(c))
# fstree is now:
# frozenset([frozenset([42, 44]), frozenset([frozenset([42, 44])])])
# with shared subtrees
``````
##### Strings

Watch out with strings as leaves. The default `getchildren` will go into infinite recursion because `iter('s')` will yield 's' again. To solve this, you can use:

``````def getchildren(tree):
assert not isinstance(tree,(str,unicode))
return iter(tree)
``````
##### YAML

YAML is a human-readable language for data structure serialization with support for sharing. A YAML sequence corresponds to a Python `list` and a YAML mapping to a `dict`. We can use `foldsh` to recurse into the data structure (for the dicts, we only recurse into the values), and, for example, reverse all the lists:

``````import yaml    # from http://pyyaml.org/

def reverselists(tree):
def branch(tree,reschildren):
if isinstance(tree,dict):
return dict(zip(tree.iterkeys(),reschildren))
else:
return list(reversed(reschildren))
def getchildren(tree):
try:
return tree.itervalues()
except:
assert not isinstance(tree,str)
return iter(tree)
return foldsh(tree,branch,getchildren=getchildren)

ingredients:
- &s       # this element is given label s and can be shared later
- spam
- SPAM
- eggs
- *s       # like here
- *s       # and here
''')
``````

Sharing is preserved in the resulting YAML:

``````>>> print(yaml.dump(reverselists(y),default_flow_style=False))
ingredients:
- &id001
- SPAM
- spam
- *id001
- eggs
- *id001
``````
##### Cycles

This recipe does not support cyclic data structures (it will go into infinite recursion). For a way to deal with this, see Recipe 578118. Created by Sander Evers on Wed, 2 May 2012 (MIT)

### Required Modules

• (none specified)