Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install collective.soupstrainer

How to install collective.soupstrainer

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install collective.soupstrainer
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
1.0 Available View build log
Windows (64-bit)
1.0 Available View build log
Mac OS X (10.5+)
1.0 Available View build log
Linux (32-bit)
1.0 Available View build log
Linux (64-bit)
1.0 Available View build log
 
License
GPL
Dependencies
Lastest release
version 1.0 on Jan 5th, 2011

collective.soupstrainer

Quite often there is a need to clean up HTML from some source, be it user input or data gathered by scraping, which needs to be cleaned up. With the SoupStrainer class in collective.soupstrainer this is made easy. It uses BeautifulSoup to parse and clean up HTML. The constructor of the class takes three arguments.

exclusions This is a list of tuples with two items each. The first item is a list of tag names, the second item is a list of attributes. If the list of attributes is empty, then each tag in the first list is completely removed from the passed in HTML. If the list of tags is empty, then each attribute listed is completely removed. If there are both tags and attributes listed, then the attributes are only removed from matching tags.

style_whitelist This is a white list of CSS styles allowed in 'style' attributes. All other styles are removed.

class_blacklist This is a black list for CSS classes. Each matching class is removed from 'class' attributes.

An instance of the SoupStrainer class can be called directly with one argument. The argument can either be a string, in which case it will internally be parsed by BeautifulSoup and the result will be unicode, or it can be a parsed HTML tree created by BeautifulSoup, in which case it will be modified in place and be returned again.

Changelog

1.0 - 2008-11-14
  • Initial release

Subscribe to package updates

Last updated Jan 5th, 2011

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.