Welcome, guest | Sign In | My Account | Store | Cart

An example showing how Python's features can be used to optimize the construction of tree-like data structures in order to produce directed acyclic graphs (DAGs) instead.

Python, 84 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
"""
Expr             This is a class hierarchy for representing arithmetic
  |--Atom        expressions as DAGs.  To keep the example simple, I
  |    |--Lit    have factored as much as I could into the base classes.
  |    |--Var    The most important method is Expr.__new__, which checks
  |              whether there is an existing node equivalent to the one
  |--UnOp        being constructed.  If there is, then __new__ returns
  |    |--Neg    the existing node; otherwise __new__ constructs a new
  |    |--Abs    node, then interns and returns it.
  |
  |--BinOp       For convenience, I have overloaded the arithmetic
       |--Add    operators so that they can be used to construct Exprs.
       |--Sub    E.g., something like Lit(1) + Lit(2) * -Var('x') is
       |--Mul    equivalent to Add(Lit(1), Mul(Lit(2), Neg(Var('x')))).
       |--Div    But otherwise I have kept features to a minimum.
"""

class Expr(object):
    __exprs = {}  # dict for interning nodes

    def __new__(cls, *args):
        # Get the canonical form of the node, like a hash key.  The
        # simple technique below works because nodes are constructed
        # bottom-up, so we know that the contents of args are unique.
        canonical = (cls, args)
        # Now see if an equivalent node is already interned.
        try:
            Expr.__exprs[canonical]
            print "Using existing %s node." % cls.__name__
        except KeyError:
            print "Constructing new %s node." % cls.__name__
            Expr.__exprs[canonical] = object.__new__(cls, *args)
        return Expr.__exprs[canonical]

    def __neg__(self): return Neg(self)
    def __abs__(self): return Abs(self)
    def __add__(self, r): return Add(self, r)
    def __sub__(self, r): return Sub(self, r)
    def __mul__(self, r): return Mul(self, r)
    def __div__(self, r): return Div(self, r)


class Atom(Expr):
    def __init__(self, numOrStr):
        try: self.v
        except AttributeError: self.v = numOrStr

class Lit(Atom): pass
class Var(Atom): pass


class UnOp(Expr):
    def __init__(self, operand):
        try: self.o
        except AttributeError: self.o = operand    

class Neg(UnOp): pass
class Abs(UnOp): pass


class BinOp(Expr):
    def __init__(self, left, right):
        try: self.l  # or self.r; whichever
        except AttributeError: self.l = left; self.r = right

class Add(BinOp): pass
class Sub(BinOp): pass
class Mul(BinOp): pass
class Div(BinOp): pass


# A function to count the nodes in an expression
def count_nodes(exp):
    visited = []

    def walk(exp):
        if exp in visited: return 0  # don't count nodes already visited

        visited.append(exp)
        if isinstance(exp, Atom): return 1
        elif isinstance(exp, UnOp): return 1 + walk(exp.o)
        elif isinstance(exp, BinOp): return 1 + walk(exp.l) + walk(exp.r)

    return walk(exp)

Trees are useful, but for certain applications, they can lead to the redundant representation of data. For example, a naive implementation of a parse tree will represent common subexpressions as distinct subtrees, when it would be more efficient to represent them with a single, shared structure. For cases such as this, an appropriate data structure is a directed acyclic graph (DAG). DAGs are similar to trees, except that a node can have more than one parent (or, different nodes can share children); a class hierarchy forms a DAG, for example. Python's dictionaries and object-oriented features make it easy to modify the constructors of a tree-like data structure so that DAGs are constructed instead.

In this recipe, the important work occurs in Expr.__new. This function checks if there is already an Expr equivalent to the one being constructed and returns it if it exists; otherwise it constructs a new node and interns it in a dictionary. The Expr class keeps a dictionary containing the nodes it constructs, keyed by tuples containing pointers to the type and subnodes of each node (the 'canonical form' of a node). This simple method for keying a node and determing whether nodes are equivalent works because nodes are constructed from the bottom up, so that when a given node is being constructed, we can be sure that its subnodes have already been interned and are unique.

Here is a sample transcript demonstrating how the class constructs the expression (2x+1)y - (2x+1)/y:

>>> exp = (Lit(2)*Var('x')+Lit(1))*Var('y') - \
...       (Lit(2)*Var('x')+Lit(1))/Var('y')
Constructing new Lit node.
Constructing new Var node.
Constructing new Mul node.
Constructing new Lit node.
Constructing new Add node.
Constructing new Var node.
Constructing new Mul node.
Using existing Lit node.
Using existing Var node.
Using existing Mul node.
Using existing Lit node.
Using existing Add node.
Using existing Var node.
Constructing new Div node.
Constructing new Sub node.



>>> count_nodes(exp)
9

If we were using trees, we would expect 15 nodes to be constructed. But we can see that the duplicate subexpressions are reused, so that only 9 nodes need to be constructed.

Here are some issues to keep in mind: * For this particular example, finding the canonical form of a node was simple, but in general this will of course depend on your application. * It is assumed that you will not mutate the DAGs after they are constructed. * We could have written __new so that it uses __exprs.setdefault, thus making __new into a one-liner. This has the disadvantage of always constructing a new node for the second argument to setdefault whether we need it or not; in general it would probably be better to avoid this. * We could also have written all the __init functions to blindly assign attributes without checking whether they already exist. That would be fine for this example, but in general, it would again probably be better to avoid this. * Nodes are shared by all instances of Expr. * Expr.__exprs stores every unique node that is constructed, and they are never freed. * This example doesn't do much; it is up to your application to exploit the shared structure in DAGs.

References: Alex Martelli, Python in a Nutshell Thomas Parsons, An Introduction to Compiler Construction

Created by Aaron Leung on Sat, 4 Sep 2004 (PSF)
Python recipes (4591)
Aaron Leung's recipes (1)

Required Modules

  • (none specified)

Other Information and Tasks