Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install chomsky

How to install chomsky

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install chomsky
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
v0.0.6
1.0.2Never BuiltWhy not?
v0.0.6 Available View build log
v0.0.2 Failed View build log
Windows (64-bit)
Mac OS X (10.5+)
v0.0.6
1.0.2Never BuiltWhy not?
v0.0.6 Available View build log
v0.0.2 Failed View build log
v0.0.6
1.0.2Never BuiltWhy not?
v0.0.6 Available View build log
v0.0.2 Failed View build log
v0.0.1 Failed View build log
Linux (32-bit)
v0.0.20
1.0.2Never BuiltWhy not?
v0.0.20 Available View build log
v0.0.6 Available View build log
v0.0.2 Failed View build log
v0.0.1 Failed View build log
v0.0.6
1.0.2Never BuiltWhy not?
v0.0.6 Available View build log
v0.0.2 Failed View build log
v0.0.1 Failed View build log
Linux (64-bit)
1.0.2 Available View build log
v0.0.6 Available View build log
v0.0.2 Failed View build log
v0.0.1 Failed View build log
v0.0.6
1.0.2Never BuiltWhy not?
v0.0.6 Available View build log
v0.0.2 Failed View build log
v0.0.1 Failed View build log
1.0.2 Available View build log
 
License
BSD
Depended by
Imports
Lastest release
version 1.0.2 on May 11th, 2013
Build Status

I needed a language grammar parser for the plywood project, and modgrammar looked like it would be perfect, except I couldn't get the simplest of grammars to work. pyparsing is excellent, but doesn't give me objects back, only lists and strings - I need more than that. I would recommend pyparsing for your project. Unless you really want objects, or if you are doing a language (chomsky has lots of built-in stuff for making programming language grammars).

Besides, I like writing parsers, and I know how I want this one to work, so screw it, I'll do it myself!

INSTALLATION

$ pip install chomsky

USAGE

Matchers

Matcher objects are the most basic building blocks. They are not smart, they return only strings and lists, and they make no assumptions about what you might be trying to build. For instance, the Chars Matcher does not assume that you want to consume whitespace.

Matcher objects are great for building a small parsing language for consistent data, where Grammar objects are not needed. But for building a language parser, you will probably use the more heavy-duty Grammar building blocks.

Char

Matches a single letter from a string of accepted letters. There are lots of built-in strings in the string module.

test/matchers/test_letter_matcher.py

matcher = Char('abcde')
matcher('a') => 'a'
matcher('bcd') => 'b'
matcher('f') => raise ParseException
# shorthand:
matcher = A('abcde')

import string
matcher = A(string.letters + string.digits + '_')
Chars

Matches one or more letters from string of accepted letters.

You can also set min and max options. min will raise a ParseException if the matched word is not long enough. Default is 1. max will stop matching once max characters are matched.

test/matchers/test_word_matcher.py
matcher = Chars('abcde')
matcher('a') => 'a'
matcher('bcd') => 'bcd'
matcher('defg') => 'defg'
matcher('fghi') => ParseException

# max
matcher = Chars('abcde', max=2)
matcher('bcd') => 'bc'

# min
matcher = Chars('abcde', min=3)
matcher('ab') => ParseException
Literal

Matches a literal string.

test/matchers/test_literal_matcher.py
matcher = Literal('abcde')
matcher('a') => 'a'
matcher('bcd') => 'bcd'
matcher('defg') => 'defg'
matcher('fghi') => ParseException
Whitespace
test/matchers/test_whitespace_matcher.py
matcher = Whitespace()  # default is " \t"
matcher("    ") => "    "
matcher(" \t\n ") => " \t"
matcher = Whitespace(" \t\n")
matcher(" \t\n ") => " \t\n "
Regex

These have two options: group and advance.

group says which group or groups to return. Default is 0 (the entire match). A list or tuple of groups will return a list of results. advance indicates what group to advance past. Default is 0 (the entire match). This is a quick way to build a matching system that can parse consistently formatted data, for example.

test/matchers/test_regex_matcher.py
matcher = Regex("([a-zA-Z_][0-9])")
matcher('a1') => 'a1'

# group
matcher = Regex("([a-zA-Z_][0-9])", group=1)
matcher('a1') => 'a'

# to demonstrate `advance`, I will have to add two regex Matchers, which
# returns a list
matcher = Regex("([a-zA-Z_][0-9])", group=1, advance=1) + Regex("([0-9])", group=1)
matcher('a1') => ['a', '1']
Sequence

There are two flavors of Sequence. One you can declare yourself, called Sequence, the other is created automatically when you add or multiply Matcher objects. Don't worry about that one, it "just works" (we saw it above in the Regex example).

test/matchers/test_sequence_matcher.py
matcher = Sequence(Literal('Hello '), Literal('World'), Char('!.'))
matcher('Hello World!') => ['Hello ', 'World', '!']
matcher('Hello World.') => ['Hello ', 'World', '.']
matcher('Hello, World.') => ParseException

The automatic Sequence type is created whenever you use addition or multiplication to repeat a series of Matcher-s.

Addition:

test/matchers/test_matcher_addition.py
matcher = Literal('Hello ') + Literal('World') + Char('!.')
matcher('Hello World!') => ['Hello ', 'World', '!']
matcher('Hello World.') => ['Hello ', 'World', '.']
matcher('Hello, World.') => ParseException

Multiplication:

test/matcher/test_matcher_multiplication.py
import string
matcher = (Chars(string.letters) + Literal(' ')) * 3
matcher('why hello there ') => [['why', ' '], ['hello', ' '], ['there', ' ']]
matcher('not enough spaces') => ParseException
NMatches

NMatches is not an intuitively named class, but its child classes are, and you'll probably use them a lot.

ZeroOrMore:

test/matcher/test_zero_or_more_matcher.py
matcher = ZeroOrMore(Literal('hi'))
matcher('') => []
matcher('hi') => ['hi']
matcher('hihi') => ['hi', 'hi']

OneOrMore:

test/matcher/test_one_or_more_matcher.py
matcher = OneOrMore(Literal('hi'))
matcher('hi') => ['hi']
matcher('hihi') => ['hi', 'hi']
matcher('') => ParseException

Optional:

test/matcher/test_optional_matcher.py
matcher = Literal('Hello') + Optional(Literal(',')) + Literal(' ') + Literal('World')
matcher('Hello World') => ['Hello', [], ' ', 'World']
matcher('Hello, World') => ['Hello', [','], ' ', 'World']
matcher('Hello, Bozo') => ParseException

NMatches:

test/matcher/test_nmatcher.py
matcher = NMatches(Literal('hi'), min=2, max=3)
matcher('hi') => ParseException
matcher('hihi') => ['hi', 'hi']
matcher('hihihi') => ['hi', 'hi', 'hi']
matcher('hihihihi') => ['hi', 'hi', 'hi']  # only 3 matches
Any

Given a list of Matchers, any of them can match (tested in order left-to-right). The first to match is returned.

test/matcher/test_any_matcher.py
matcher = Any(Literal('Joey'), Literal('Bob'), Literal('Bill'))
matcher('Bob') => 'Bob'
matcher('Jane') => ParseException
Look-ahead and Behind

Looking-ahead is simple and low-cost. The NextIs matcher makes sure that the Matcher would pass, but then rolls back the cursor and does not return a Result. If the Matcher fails, an exception is raised.

Looking behind is much more expensive, because the number of characters to look at is not known before hand. A "best guess" can be made by PrevIs by using `minimum_length` and `maximum_length` methods that the Matcher classes all implement (the base class returns 0 and float('inf')). A Literal, for example, has a definite length that must be present - no more, and no less characters. The other classes also provide this min/max length calculation. But this provides only a modest performance increase.

The PrevIs matcher does not require that the previous token be an instance of the specified matcher, only that the buffer previous to the current location match. The buffer is rolled back until a match is found, or until the beginning of the buffer is reached. Sound resource intensive? Consider PrevIsNot! It looks backwards, hoping that the buffer never matches, no matter how far back it goes.

NextIs:

test/matcher/test_nextis_matcher.py
matcher = '-' + NextIs(Chars('123456789')) + Chars('1234567890')
matcher('1') => [[], '1']
matcher('-1') => [['-'], '1']
matcher('-123') => [['-'], '123']
matcher('-0') => ParseException

NextIsNot:

test/matcher/test_nextis_matcher.py
matcher = '-' + NextIsNot('0') + Chars('1234567890')
matcher('1') => [[], '1']
matcher('-1') => [['-'], '1']
matcher('-123') => [['-'], '123']
matcher('-0') => ParseException

PrevIs:

test/matcher/test_nextis_matcher.py
matcher = Chars('-.') + PrevIs('-') + Chars('1234567890')
matcher('-1') => [['-'], '1']
matcher('.123') => ParseException

PrevIsNot:

test/matcher/test_nextis_matcher.py
matcher = Chars('abc') + PrevIsNot('c') + Chars('abc')
matcher('ab') => ['a', 'b']
matcher('abc') => ['ab', 'c']
matcher('abcabc') => ['abcab', 'c']
matcher('cc') => ParseException
Grammars

Grammar objects are what you will want to work with if you are building a language grammar. They are composed of Mathcer classes (and other Grammar classes), but the objects they return are instances of the Grammar, not simple strings and lists.

The built-in Grammar-s are meant to help you understand how they work, and to use in your own language.

Numbers

Integer:

test/matcher/test_nextis_matcher.py
matcher = '-' + NextIsNot('0') + Chars('1234567890')
matcher('1') => [[], '1']
matcher('-1') => [['-'], '1']
matcher('-123') => [['-'], '123']
matcher('-0') => ParseException
Todo
QuotedString, Number, Integer, Float, Hexadecimal, Octal, Binary
LineComment, BlockComment, Block, IndentedBlock

TEST

$ pip install pytest
$ py.test

LICENSE

Copyright (c) 2012, Colin T.A. Gray All rights reserved.

author:Colin T.A. Gray
copyright:2012 Colin T.A. Gray <http://colinta.com/>
license:simplified BSD, see LICENSE for more details.

Subscribe to package updates

Last updated May 11th, 2013

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.