Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install speedparser

How to install speedparser

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install speedparser
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
0.1.8 Available View build log
0.1.7 Available View build log
0.1.6 Available View build log
0.1.5 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.2 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Windows (64-bit)
0.1.8 Available View build log
0.1.7 Available View build log
0.1.6 Available View build log
0.1.5 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.2 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Mac OS X (10.5+)
0.1.8 Available View build log
0.1.7 Available View build log
0.1.6 Available View build log
0.1.5 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.2 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Linux (32-bit)
0.1.8 Available View build log
0.1.7 Available View build log
0.1.6 Available View build log
0.1.5 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.2 Available View build log
0.1.1 Available View build log
0.1 Available View build log
Linux (64-bit)
0.1.8 Available View build log
0.1.7 Available View build log
0.1.6 Available View build log
0.1.5 Available View build log
0.1.4 Available View build log
0.1.3 Available View build log
0.1.2 Available View build log
0.1.1 Available View build log
0.1 Available View build log
 
Author
License
MIT
Dependencies
Imports
Lastest release
version 0.1.8 on Jan 26th, 2012

speedparser

Speedparser is a black-box "style" reimplementation of the Universal Feed Parser. It uses some feedparser code for date and authors, but mostly re-implements its data normalization algorithms based on feedparser output. It uses lxml for feed parsing and for optional HTML cleaning. Its compatibility with feedparser is very good for a strict subset of fields, but poor for fields outside that subset. See tests/speedparsertests.py for more information on which fields are more or less compatible and which are not.

On an Intel(R) Core(TM) i5 750, running only on one core, feedparser managed 2.5 feeds/sec on the test feed set (roughly 4200 "feeds" in tests/feeds.tar.bz2), while speedparser manages around 65 feeds/sec with HTML cleaning on and 200 feeds/sec with cleaning off.

installing

pip install speedparser

usage

Usage is similar to feedparser:

>>> import speedparser
>>> result = speedparser.parse(feed)
>>> result = speedparser.parse(feed, clean_html=False)

differences

There are a few interface differences and many result differences between speedparser and feedparser. The biggest similarity is that they both return a FeedParserDict() object (with keys accessible as attributes), they both set the bozo key when an error is encountered, and various aspects of the feed and entries keys are likely to be identical or very similar.

speedparser uses different (and in some cases less or none; buyer beware) data cleaning algorithms than feedparser. When it is enabled, lxml's html.cleaner library will be used to clean HTML and give similar but not identical protection against various attributes and elements. If you supply your own Cleaner element to the "clean_html kwarg, it will be used by speedparser to clean the various attributes of the feed and entries.

If your application is using feedparser to consume many feeds at once and CPU is becoming a bottleneck, you might want to try out speedparser as an alternative (using feedparser as a backup). If you are writing an application that does not ingest many feeds, or where CPU is not a problem, you should use feedparser as it is flexible with bad or malformed data and has a much better test suite.

Subscribe to package updates

Last updated Jan 26th, 2012

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.