The function `foldsh`

in this recipe is a general purpose tool for transforming tree-like recursive data structures while keeping track of shared subtrees.

```
# By default, a branch is encoded as a list of subtrees; each subtree can be a
# branch or a leaf (=anything non-iterable). Subtrees can be shared:
>>> subtree = [42,44]
>>> tree = [subtree,[subtree]]
# We can apply a function to all leaves:
>>> foldsh(tree, leaf= lambda x: x+1)
[[43, 45], [[43, 45]]]
# Or apply a function to the branches:
>>> foldsh(tree, branch= lambda t,c: list(reversed(c)))
[[[44, 42]], [44, 42]]
# The sharing is preserved:
>>> _[0][0] is _[1]
True
# Summing up the leaves without double counting of shared subtrees:
>>> foldsh(tree, branch= lambda t,c: sum(c), shared= lambda x: 0)
86
```

In particular, it is useful for transforming YAML documents. An example of this is given below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ```
def identity(x):
return x
def foldsh(tree, branch= lambda tree,ch: list(ch), leaf=identity, shared=identity, getchildren=iter):
'''Sharing-aware tree transformations.
foldsh(tree,branch,leaf,shared,getchildren) :: tree -> result
branch :: tree, list of children's results -> result
leaf :: tree -> result
shared :: result -> result
getchildren :: tree -> iterable of children'''
mem = dict()
def _fold(tree):
if id(tree) in mem:
return shared(mem[id(tree)])
else:
try:
children = getchildren(tree)
except:
res = leaf(tree)
else:
reschildren = [_fold(child) for child in children]
res = branch(tree,reschildren)
mem[id(tree)] = res
return res
return _fold(tree)
``` |

##### Changelog

Revision 2: rewrite using the try...except...**else** idiom which I didn't know before and makes the logical structure clearer. Functionality is unchanged.

##### Background

Many transformations on trees follow the same general pattern. In a recursive traversal,

- first, leaves are transformed into results (here, this is the task of the function
`leaf`

); - when all subtrees (children) of a branch have been transformed, the results of these children are combined into a result for the branch (the task of function
`branch`

); - this way, results are passed up the tree until the result for the whole tree is constructed.

In this recipe, the recursion is performed by `foldsh`

; the transformation functions `leaf`

and `branch`

are supplied by the user. Note that a "result" can be anything, for example

- an integer counting the number of leaves, depth of the tree, maximal breadth of branches;
- a list of all the leaves in the tree;
- another tree.

The general pattern is known as a fold (and in Python as `reduce`

; but both these terms are mostly used for inductive lists) or catamorphism.

In addition to implementing a regular fold, this recipe supports sharing of subtrees. (So the data structure is actually a DAG, but we'll still say tree.) If during the recursive traversal `foldsh`

encounters a subtree that it has seen before, it will not recurse into it, but instead take the previously constructed result, and apply `shared`

to it.

To make `foldsh`

recurse over other tree-like structures, supply it with a different `getchildren`

function. This function will be applied to the tree (and subtrees) to determine whether we have reached a branch or a leaf. If `getchildren`

is applied to a branch, it should return an iterable of subtrees; if applied to a leaf, it should throw an exception.

With no arguments except its first, `foldsh(tree)`

returns a copy of `tree`

where the structure consists of newly constructed lists, and the leaves are shared with `tree`

. In the new structure, the sharing of subtrees is preserved.

##### Examples

```
def countsh(tree):
'''Count the number of leaf+branch nodes (count shared subtrees once).'''
def branch(tree,reschildren):
return 1 + sum(reschildren)
def leaf(tree):
return 1
def shared(tree):
return 0
return foldsh(tree,branch,leaf,shared)
def listleaves(tree):
'''List all leaves (but those from shared subtrees only once).'''
def leaf(tree):
return [tree]
def branch(tree,reschildren):
result = []
for rc in reschildren:
result.extend(rc)
return result
def shared(res):
return []
return foldsh(tree,branch,leaf,shared)
# Actually, this can also be elegantly solved by returning iterators as
# result. We do not have to make an exception for shared subtrees, because
# these (shared!) iterators will already be exhausted when encountered for
# the second time:
def iterleavesonce(tree):
'''Iterate all leaves (but those from shared subtrees only once).'''
def leaf(tree):
return iter([tree])
def branch(tree,reschildren):
return itertools.chain(*reschildren)
return foldsh(tree,branch,leaf)
# Build a tree out of frozensets instead of lists:
subtree = [42,44]
tree = [subtree,[subtree]]
fstree = foldsh(tree, branch= lambda t,c: frozenset(c))
# fstree is now:
# frozenset([frozenset([42, 44]), frozenset([frozenset([42, 44])])])
# with shared subtrees
```

##### Strings

Watch out with strings as leaves. The default `getchildren`

will go into infinite
recursion because `iter('s')`

will yield 's' again. To solve this, you can use:

```
def getchildren(tree):
assert not isinstance(tree,(str,unicode))
return iter(tree)
```

##### YAML

YAML is a human-readable language for data structure serialization with support for sharing. A YAML sequence corresponds to a Python `list`

and a YAML mapping to a `dict`

. We can use `foldsh`

to recurse into the data structure (for the dicts, we only recurse into the values), and, for example, reverse all the lists:

```
import yaml # from http://pyyaml.org/
def reverselists(tree):
def branch(tree,reschildren):
if isinstance(tree,dict):
return dict(zip(tree.iterkeys(),reschildren))
else:
return list(reversed(reschildren))
def getchildren(tree):
try:
return tree.itervalues()
except:
assert not isinstance(tree,str)
return iter(tree)
return foldsh(tree,branch,getchildren=getchildren)
y = yaml.load('''
ingredients:
- &s # this element is given label s and can be shared later
- spam
- SPAM
- eggs
- *s # like here
- *s # and here
''')
```

Sharing is preserved in the resulting YAML:

```
>>> print(yaml.dump(reverselists(y),default_flow_style=False))
ingredients:
- &id001
- SPAM
- spam
- *id001
- eggs
- *id001
```

##### Cycles

This recipe does not support cyclic data structures (it will go into infinite recursion). For a way to deal with this, see Recipe 578118.