Welcome, guest | Sign In | My Account | Store | Cart

This recipe shows a Finite State Machine (FSM) that can be used for small parsing tasks. The code is quite simple. The bulk of it is comments. In addition to state this FSM also maintains a user defined "memory". So this FSM is a Push-down Automata (PDA) since a PDA is a FSM + memory. This module contains an example function that demonstrates a simple RPN expression evaluator.

Python, 331 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
#!/usr/bin/env python

"""This module implements a Finite State Machine (FSM). In addition to state
this FSM also maintains a user defined "memory". So this FSM can be used as a
Push-down Automata (PDA) since a PDA is a FSM + memory.

The following describes how the FSM works, but you will probably also need to
see the example function to understand how the FSM is used in practice.

You define an FSM by building tables of transitions. For a given input symbol
the process() method uses these tables to decide what action to call and what
the next state will be. The FSM has a table of transitions that associate:

        (input_symbol, current_state) --> (action, next_state)

Where "action" is a function you define. The symbols and states can be any
objects. You use the add_transition() and add_transition_list() methods to add
to the transition table. The FSM also has a table of transitions that
associate:

        (current_state) --> (action, next_state)

You use the add_transition_any() method to add to this transition table. The
FSM also has one default transition that is not associated with any specific
input_symbol or state. You use the set_default_transition() method to set the
default transition.

When an action function is called it is passed a reference to the FSM. The
action function may then access attributes of the FSM such as input_symbol,
current_state, or "memory". The "memory" attribute can be any object that you
want to pass along to the action functions. It is not used by the FSM itself.
For parsing you would typically pass a list to be used as a stack.

The processing sequence is as follows. The process() method is given an
input_symbol to process. The FSM will search the table of transitions that
associate:

        (input_symbol, current_state) --> (action, next_state)

If the pair (input_symbol, current_state) is found then process() will call the
associated action function and then set the current state to the next_state.

If the FSM cannot find a match for (input_symbol, current_state) it will then
search the table of transitions that associate:

        (current_state) --> (action, next_state)

If the current_state is found then the process() method will call the
associated action function and then set the current state to the next_state.
Notice that this table lacks an input_symbol. It lets you define transitions
for a current_state and ANY input_symbol. Hence, it is called the "any" table.
Remember, it is always checked after first searching the table for a specific
(input_symbol, current_state).

For the case where the FSM did not match either of the previous two cases the
FSM will try to use the default transition. If the default transition is
defined then the process() method will call the associated action function and
then set the current state to the next_state. This lets you define a default
transition as a catch-all case. You can think of it as an exception handler.
There can be only one default transition.

Finally, if none of the previous cases are defined for an input_symbol and
current_state then the FSM will raise an exception. This may be desirable, but
you can always prevent this just by defining a default transition.

Noah Spurrier 20020822
"""

class ExceptionFSM(Exception):

    """This is the FSM Exception class."""

    def __init__(self, value):
        self.value = value

    def __str__(self):
        return `self.value`

class FSM:

    """This is a Finite State Machine (FSM).
    """

    def __init__(self, initial_state, memory=None):

        """This creates the FSM. You set the initial state here. The "memory"
        attribute is any object that you want to pass along to the action
        functions. It is not used by the FSM. For parsing you would typically
        pass a list to be used as a stack. """

        # Map (input_symbol, current_state) --> (action, next_state).
        self.state_transitions = {}
        # Map (current_state) --> (action, next_state).
        self.state_transitions_any = {}
        self.default_transition = None

        self.input_symbol = None
        self.initial_state = initial_state
        self.current_state = self.initial_state
        self.next_state = None
        self.action = None
        self.memory = memory

    def reset (self):

        """This sets the current_state to the initial_state and sets
        input_symbol to None. The initial state was set by the constructor
        __init__(). """

        self.current_state = self.initial_state
        self.input_symbol = None

    def add_transition (self, input_symbol, state, action=None, next_state=None):

        """This adds a transition that associates:

                (input_symbol, current_state) --> (action, next_state)

        The action may be set to None in which case the process() method will
        ignore the action and only set the next_state. The next_state may be
        set to None in which case the current state will be unchanged.

        You can also set transitions for a list of symbols by using
        add_transition_list(). """

        if next_state is None:
            next_state = state
        self.state_transitions[(input_symbol, state)] = (action, next_state)

    def add_transition_list (self, list_input_symbols, state, action=None, next_state=None):

        """This adds the same transition for a list of input symbols.
        You can pass a list or a string. Note that it is handy to use
        string.digits, string.whitespace, string.letters, etc. to add
        transitions that match character classes.

        The action may be set to None in which case the process() method will
        ignore the action and only set the next_state. The next_state may be
        set to None in which case the current state will be unchanged. """

        if next_state is None:
            next_state = state
        for input_symbol in list_input_symbols:
            self.add_transition (input_symbol, state, action, next_state)

    def add_transition_any (self, state, action=None, next_state=None):

        """This adds a transition that associates:

                (current_state) --> (action, next_state)

        That is, any input symbol will match the current state.
        The process() method checks the "any" state associations after it first
        checks for an exact match of (input_symbol, current_state).

        The action may be set to None in which case the process() method will
        ignore the action and only set the next_state. The next_state may be
        set to None in which case the current state will be unchanged. """

        if next_state is None:
            next_state = state
        self.state_transitions_any [state] = (action, next_state)

    def set_default_transition (self, action, next_state):

        """This sets the default transition. This defines an action and
        next_state if the FSM cannot find the input symbol and the current
        state in the transition list and if the FSM cannot find the
        current_state in the transition_any list. This is useful as a final
        fall-through state for catching errors and undefined states.

        The default transition can be removed by setting the attribute
        default_transition to None. """

        self.default_transition = (action, next_state)

    def get_transition (self, input_symbol, state):

        """This returns (action, next state) given an input_symbol and state.
        This does not modify the FSM state, so calling this method has no side
        effects. Normally you do not call this method directly. It is called by
        process().

        The sequence of steps to check for a defined transition goes from the
        most specific to the least specific.

        1. Check state_transitions[] that match exactly the tuple,
            (input_symbol, state)

        2. Check state_transitions_any[] that match (state)
            In other words, match a specific state and ANY input_symbol.

        3. Check if the default_transition is defined.
            This catches any input_symbol and any state.
            This is a handler for errors, undefined states, or defaults.

        4. No transition was defined. If we get here then raise an exception.
        """

        if self.state_transitions.has_key((input_symbol, state)):
            return self.state_transitions[(input_symbol, state)]
        elif self.state_transitions_any.has_key (state):
            return self.state_transitions_any[state]
        elif self.default_transition is not None:
            return self.default_transition
        else:
            raise ExceptionFSM ('Transition is undefined: (%s, %s).' %
                (str(input_symbol), str(state)) )

    def process (self, input_symbol):

        """This is the main method that you call to process input. This may
        cause the FSM to change state and call an action. This method calls
        get_transition() to find the action and next_state associated with the
        input_symbol and current_state. If the action is None then the action
        is not called and only the current state is changed. This method
        processes one complete input symbol. You can process a list of symbols
        (or a string) by calling process_list(). """

        self.input_symbol = input_symbol
        (self.action, self.next_state) = self.get_transition (self.input_symbol, self.current_state)
        if self.action is not None:
            self.action (self)
        self.current_state = self.next_state
        self.next_state = None

    def process_list (self, input_symbols):

        """This takes a list and sends each element to process(). The list may
        be a string or any iterable object. """

        for s in input_symbols:
            self.process (s)

##############################################################################
# The following is an example that demonstrates the use of the FSM class to
# process an RPN expression. Run this module from the command line. You will
# get a prompt > for input. Enter an RPN Expression. Numbers may be integers.
# Operators are * / + - Use the = sign to evaluate and print the expression.
# For example: 
#
#    167 3 2 2 * * * 1 - =
#
# will print:
#
#    2003
##############################################################################

import sys, os, traceback, optparse, time, string

#
# These define the actions. 
# Note that "memory" is a list being used as a stack.
#

def BeginBuildNumber (fsm):
    fsm.memory.append (fsm.input_symbol)

def BuildNumber (fsm):
    s = fsm.memory.pop ()
    s = s + fsm.input_symbol
    fsm.memory.append (s)

def EndBuildNumber (fsm):
    s = fsm.memory.pop ()
    fsm.memory.append (int(s))

def DoOperator (fsm):
    ar = fsm.memory.pop()
    al = fsm.memory.pop()
    if fsm.input_symbol == '+':
        fsm.memory.append (al + ar)
    elif fsm.input_symbol == '-':
        fsm.memory.append (al - ar)
    elif fsm.input_symbol == '*':
        fsm.memory.append (al * ar)
    elif fsm.input_symbol == '/':
        fsm.memory.append (al / ar)

def DoEqual (fsm):
    print str(fsm.memory.pop())

def Error (fsm):
    print 'That does not compute.'
    print str(fsm.input_symbol)

def main():

    """This is where the example starts and the FSM state transitions are
    defined. Note that states are strings (such as 'INIT'). This is not
    necessary, but it makes the example easier to read. """

    f = FSM ('INIT', []) # "memory" will be used as a stack.
    f.set_default_transition (Error, 'INIT')
    f.add_transition_any  ('INIT', None, 'INIT')
    f.add_transition      ('=',               'INIT',            DoEqual,          'INIT')
    f.add_transition_list (string.digits,     'INIT',            BeginBuildNumber, 'BUILDING_NUMBER')
    f.add_transition_list (string.digits,     'BUILDING_NUMBER', BuildNumber,      'BUILDING_NUMBER')
    f.add_transition_list (string.whitespace, 'BUILDING_NUMBER', EndBuildNumber,   'INIT')
    f.add_transition_list ('+-*/',            'INIT',            DoOperator,       'INIT')

    print
    print 'Enter an RPN Expression.'
    print 'Numbers may be integers. Operators are * / + -'
    print 'Use the = sign to evaluate and print the expression.'
    print 'For example: '
    print '    167 3 2 2 * * * 1 - ='
    inputstr = raw_input ('> ')
    f.process_list(inputstr)

if __name__ == '__main__':
    try:
        start_time = time.time()
        parser = optparse.OptionParser(formatter=optparse.TitledHelpFormatter(), usage=globals()['__doc__'], version='$Id$')
        parser.add_option ('-v', '--verbose', action='store_true', default=False, help='verbose output')
        (options, args) = parser.parse_args()
        if options.verbose: print time.asctime()
        main()
        if options.verbose: print time.asctime()
        if options.verbose: print 'TOTAL TIME IN MINUTES:',
        if options.verbose: print (time.time() - start_time) / 60.0
        sys.exit(0)
    except KeyboardInterrupt, e: # Ctrl-C
        raise e
    except SystemExit, e: # sys.exit()
        raise e
    except Exception, e:
        print 'ERROR, UNEXPECTED EXCEPTION'
        print str(e)
        traceback.print_exc()
        os._exit(1)

Finite State Machines and Push-Down Automata are a fundamental concepts in computer science. They serve practical as well theoretical purposes. FSMs are the foundation of scanners, parsers, and regular expressions. FSMs are used to model real world systems such as network protocols, synchronous machines such as a Mr. Coffee, or your life (sleep, eat, work, go to the bathroom).

This module shows how a FSM can be implemented with just a thin wrapper around a few dictionaries. If this module is stripped of all the doc strings then the longest method is only 9 lines of code. This FSM may not be the most efficient, but it's fine for small parsing and finite state machine jobs. It's fits a niche between a mess of regular expressions and a more sophisticated parser generator.

There are number of Python packages that will do parsing and scanning. See BisonGen, Yapps, kwParsing, Spark, and PLex.

10 comments

Ghost Riter 19 years, 11 months ago  # | flag

Info on fsm. Where can I find more good info, tutorials etc on finite state machines.

Krasna Halopti 19 years, 9 months ago  # | flag
Wade Leftwich 19 years, 8 months ago  # | flag

Subclass for unhashable input_symbols. I wanted to use this recipe to process unhashable "tokens" (in this case, element objects from effbot's most excellent elementtidy). Here's a subclass to process generic objects.

class ObjFSM(FSM):
    '''A subclass of FSM where input_symbol may be any kind of object, even an unhashable one.
    For each input_symbol to process, the machine will try a sequence of functions;
    the first to return True determines (action, next_state).
    '''

    def add_transition(self, test, state, action, next_state):
        self.state_transitions.setdefault(state, []).append((test, action, next_state))

    def get_transition(self, input_symbol, state):
        #input_symbol arg is not used, but we keep it for compatibility with FSM class
        for (test, action, next_state) in self.state_transitions.get(state, []):
            if test(self):
                return (action, next_state)
        try:
            return self.state_transitions_any[self.current_state]
        except KeyError:
            pass
        if self.default_transition != None:
            return self.default_transition
        raise ExceptionFSM('Transition is undefined: (%s, %s).' %
            (str(input_symbol), str(self.current_state)) )
Wade Leftwich 19 years, 8 months ago  # | flag

Allow a next_state of "no change" One more little modification that I found useful: if next_state is None, keep self.current_state unchanged.

def process(self, input_symbol):
    self.input_symbol = input_symbol
    (action, next_state) = self.get_transition(self.input_symbol, self.current_state)
    if action is not None:
        action(self)
    if next_state is not None:
        self.current_state = next_state
David Antliff 18 years, 9 months ago  # | flag

And allow transitions to return data to FSM caller. A simple change allows transition routines to return data back to the FSM caller:

def process(self, input_symbol):
    self.input_symbol = input_symbol
    (action, next_state) = self.get_transition(self.input_symbol, self.current_state)
    if action is not None:
        ret = action(self)
    if next_state is not None:
        self.current_state = next_state
    return ret
Karl Scalet 18 years, 4 months ago  # | flag

get_transition somewhat incorrect. passed to the function is state, whereas inside the member function current_state is used.

Noah Spurrier (author) 16 years, 5 months ago  # | flag

More efficient to add this logic to the add_transition_* methods. It's better to add the check for next_state is None in the add_transition_* methods. This keeps the state transition list cleaner and it make process() faster since it doesn't have to check for next_state is None.

Noah Spurrier (author) 16 years, 5 months ago  # | flag

FIXED. Yes, this was a mistake. It should not look at self._current_state.

Konrad Delong 16 years, 5 months ago  # | flag

And yet another implementation... http://konryd-scripts.googlecode.com/svn/trunk/automata/automaton.py

I wrote that couple of weeks ago. Does pretty much the same job, and generates animations! (needs mencoder)

Bogdan Dumitriu 14 years, 8 months ago  # | flag

Apply the following patch to the original in order to obtain a version that adds an add_empty_transition method. The new method behaves like add_transition_any but without consuming the input character. This is useful for specifying a "derived state", i.e., a state that overrides a subset of the transitions of its "base state", while reusing the rest.

--- fsm.py      Fri Aug 14 11:15:27 2009
+++ fsm-mod.py  Thu Aug 13 00:29:52 2009
@@ -90,6 +90,8 @@
         self.state_transitions = {}
         # Map (current_state) --> (action, next_state).
         self.state_transitions_any = {}
+        # Map (current_state) --> (action, next_state) without consuming symbol.
+        self.state_empty_transitions = {}
         self.default_transition = None

         self.input_symbol = None
@@ -159,6 +161,25 @@
             next_state = state
         self.state_transitions_any [state] = (action, next_state)

+    def add_empty_transition(self, state, next_state, action = None):
+
+        """This adds a transition that associates:
+
+                (current_state) --> (action, next_state)
+
+        but without consuming the input symbol. This is identical to
+        add_transition_any, except that it does not consume the input symbol.
+        add_transition_any takes precedence over add_empty_transition for any
+        given state, so if you want add_empty_transition to have effect, you
+        should not call add_transition_any for the same state.
+
+        Unlike add_transition_any, a next_state = None is not acceptable for
+        this method, as it would lead to an infinite loop. Also beware of
+        circular references, as they will lead to infinite loops as well."""
+
+        if next_state is not None:
+            self.state_empty_transitions[state] = (action, next_state)
+
     def set_default_transition (self, action, next_state):

         """This sets the default transition. This defines an action and
@@ -199,6 +220,11 @@
             return self.state_transitions[(input_symbol, state)]
         elif self.state_transitions_any.has_key (state):
             return self.state_transitions_any[state]
+        elif self.state_empty_transitions.has_key(state):
+            (action, next_state) = self.state_empty_transitions[state]
+            if action is not None:
+                action(self)
+            return self.get_transition(input_symbol, next_state)
         elif self.default_transition is not None:
             return self.default_transition
         else: