An example showing how Python's features can be used to optimize the construction of treelike data structures in order to produce directed acyclic graphs (DAGs) instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84  """
Expr This is a class hierarchy for representing arithmetic
Atom expressions as DAGs. To keep the example simple, I
 Lit have factored as much as I could into the base classes.
 Var The most important method is Expr.__new__, which checks
 whether there is an existing node equivalent to the one
UnOp being constructed. If there is, then __new__ returns
 Neg the existing node; otherwise __new__ constructs a new
 Abs node, then interns and returns it.

BinOp For convenience, I have overloaded the arithmetic
Add operators so that they can be used to construct Exprs.
Sub E.g., something like Lit(1) + Lit(2) * Var('x') is
Mul equivalent to Add(Lit(1), Mul(Lit(2), Neg(Var('x')))).
Div But otherwise I have kept features to a minimum.
"""
class Expr(object):
__exprs = {} # dict for interning nodes
def __new__(cls, *args):
# Get the canonical form of the node, like a hash key. The
# simple technique below works because nodes are constructed
# bottomup, so we know that the contents of args are unique.
canonical = (cls, args)
# Now see if an equivalent node is already interned.
try:
Expr.__exprs[canonical]
print "Using existing %s node." % cls.__name__
except KeyError:
print "Constructing new %s node." % cls.__name__
Expr.__exprs[canonical] = object.__new__(cls, *args)
return Expr.__exprs[canonical]
def __neg__(self): return Neg(self)
def __abs__(self): return Abs(self)
def __add__(self, r): return Add(self, r)
def __sub__(self, r): return Sub(self, r)
def __mul__(self, r): return Mul(self, r)
def __div__(self, r): return Div(self, r)
class Atom(Expr):
def __init__(self, numOrStr):
try: self.v
except AttributeError: self.v = numOrStr
class Lit(Atom): pass
class Var(Atom): pass
class UnOp(Expr):
def __init__(self, operand):
try: self.o
except AttributeError: self.o = operand
class Neg(UnOp): pass
class Abs(UnOp): pass
class BinOp(Expr):
def __init__(self, left, right):
try: self.l # or self.r; whichever
except AttributeError: self.l = left; self.r = right
class Add(BinOp): pass
class Sub(BinOp): pass
class Mul(BinOp): pass
class Div(BinOp): pass
# A function to count the nodes in an expression
def count_nodes(exp):
visited = []
def walk(exp):
if exp in visited: return 0 # don't count nodes already visited
visited.append(exp)
if isinstance(exp, Atom): return 1
elif isinstance(exp, UnOp): return 1 + walk(exp.o)
elif isinstance(exp, BinOp): return 1 + walk(exp.l) + walk(exp.r)
return walk(exp)

Trees are useful, but for certain applications, they can lead to the redundant representation of data. For example, a naive implementation of a parse tree will represent common subexpressions as distinct subtrees, when it would be more efficient to represent them with a single, shared structure. For cases such as this, an appropriate data structure is a directed acyclic graph (DAG). DAGs are similar to trees, except that a node can have more than one parent (or, different nodes can share children); a class hierarchy forms a DAG, for example. Python's dictionaries and objectoriented features make it easy to modify the constructors of a treelike data structure so that DAGs are constructed instead.
In this recipe, the important work occurs in Expr.__new. This function checks if there is already an Expr equivalent to the one being constructed and returns it if it exists; otherwise it constructs a new node and interns it in a dictionary. The Expr class keeps a dictionary containing the nodes it constructs, keyed by tuples containing pointers to the type and subnodes of each node (the 'canonical form' of a node). This simple method for keying a node and determing whether nodes are equivalent works because nodes are constructed from the bottom up, so that when a given node is being constructed, we can be sure that its subnodes have already been interned and are unique.
Here is a sample transcript demonstrating how the class constructs the expression (2x+1)y  (2x+1)/y:
>>> exp = (Lit(2)*Var('x')+Lit(1))*Var('y')  \
... (Lit(2)*Var('x')+Lit(1))/Var('y')
Constructing new Lit node.
Constructing new Var node.
Constructing new Mul node.
Constructing new Lit node.
Constructing new Add node.
Constructing new Var node.
Constructing new Mul node.
Using existing Lit node.
Using existing Var node.
Using existing Mul node.
Using existing Lit node.
Using existing Add node.
Using existing Var node.
Constructing new Div node.
Constructing new Sub node.
>>> count_nodes(exp)
9
If we were using trees, we would expect 15 nodes to be constructed. But we can see that the duplicate subexpressions are reused, so that only 9 nodes need to be constructed.
Here are some issues to keep in mind: * For this particular example, finding the canonical form of a node was simple, but in general this will of course depend on your application. * It is assumed that you will not mutate the DAGs after they are constructed. * We could have written __new so that it uses __exprs.setdefault, thus making __new into a oneliner. This has the disadvantage of always constructing a new node for the second argument to setdefault whether we need it or not; in general it would probably be better to avoid this. * We could also have written all the __init functions to blindly assign attributes without checking whether they already exist. That would be fine for this example, but in general, it would again probably be better to avoid this. * Nodes are shared by all instances of Expr. * Expr.__exprs stores every unique node that is constructed, and they are never freed. * This example doesn't do much; it is up to your application to exploit the shared structure in DAGs.
References: Alex Martelli, Python in a Nutshell Thomas Parsons, An Introduction to Compiler Construction