Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

pypm install mr.importer

How to install mr.importer

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install mr.importer
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
1.0a5 Available View build log
Windows (64-bit)
1.0a5 Available View build log
Mac OS X (10.5+)
1.0a5 Available View build log
Linux (32-bit)
1.0a5 Available View build log
Linux (64-bit)
1.0a5 Available View build log
Lastest release
version 1.0a5 on Feb 6th, 2011


Easily import static websites on the file system into Plone via a command like:

$ bin/plone run bin/import /path/to/files

mr.importer is a Buildout recipe that creates a script for you to easily get content from static HTML websites on the file system into Plone.


This is a Buildout recipe for use with Plone; by itself it does nothing. If you don't know what Plone is, please see: http://plone.org. If you don't know what Buildout is, please see: http://www.buildout.org/.

Getting started

First, a couple caveats:

  • A Plone site object must exist in the Zope2 instance database. By default in mr.importer, that site object is assumed to be named "Plone".
  • An admin user must exist in the Zope2 instance database (or Plone site). By default in mr.importer that user is assumed to be named "admin".

And because it drives the author nuts whenever he has to dig for a recipe's options, here are this recipe's options with sample values:

recipe = mr.importer

# core features
path = /Plone
user = admin
illegal_chars = _ . +
illegal_words =
illegal_expressions =
html_extensions = html
image_extensions = png
file_extensions = mp3
target_tags = p

# additional features
force = false
publish = false
collapse = false
create_spreadsheet = false
replacetypes =
rename =
match =
paths =


The parameters listed above are configured with their default values. Edit these values if you would like to change the default behavior; they are (mostly) self-explanatory. Now you can just cut and paste to get started or keep reading if you would like to know more.


This recipe creates a script that is not intended to be run directly. Due to technical limitations, the author was not able to implement a user friendly error message. So if you run bin/import directly you will see this:

$ bin/import
Traceback (most recent call last):
  File "bin/import", line 116, in <module>
    mr.importer.main(app=app, path='/Plone', illegal_chars='_,.',
illegal_words='id,start', illegal_expressions='[0-9]', html_extensions='html',
image_extensions='gif,jpg,jpeg,png', file_extensions='mp3,xls',
target_tags='a,div,font,h1,h2,p', force=True, publish=False, collapse=False,
rename=None, replacetypes=None, match=None, create_spreadsheet=True)
NameError: name 'app' is not defined

To avoid this, run the script as intended:

$ bin/plone run bin/import /path/to/files

See the execution section below for more information.


You can install mr.importer by editing your buildout.cfg file like so. First add an import section:

recipe = mr.importer

Then add the import section to the list of parts:

parts =

Now run bin/buildout as usual.


The section name import is arbitrary, you can call it whatever you want. Just keep in mind that the section name corresponds directly to the script name. In other words, whatever you name the section - that's what the script will be called.


Now you can run mr.importer like this:

$ bin/plone run bin/import /path/to/files


In the example above and examples below, bin/plone refers to a Zope 2 instance script created by plone.recipe.zope2instance.

Your bin/plone script may be called bin/instance or bin/client, etc. instead.


If you have a site in /var/www/html that contains the following:


You should run:

$ bin/plone run bin/import /var/www/html

And the following will be created:


Modifying the default behavior of mr.importer is easy; just use the command line options or add parameters to your buildout.cfg file. Both approaches allow customization of the exact same set of options, but the command line arguments will trump any settings found in your buildout.cfg file.

Buildout options

You can configure the following parameters in your buildout.cfg file in the mr.importer recipe section.

Parameter Default value Description
path /Plone Specify an alternate location in the database for the import to occur.
user admin Specify an alternate user to import content with.
illegal_chars _ . Specify illegal characters. mr.importer will ignore files that contain these characters.
html_extensions html Specify HTML file extensions. mr.importer will import HTML files with these extensions
image_extensions png, gif, jpg, jpeg, Specify image file extensions. mr.importer will import image files with these extensions.
file_extensions mp3, xls Specify image file extensions. mr.importer will import files with with these extensions as files in Plone (unless you configure create_spreadsheet=true, see below)
target_tags a h1 h2 p Specify target tags. mr.importer will parse the contents of HTML tags listed. If any tag is provided as an XPath expression (any expression begining with /) the matching elements will first be extracted from the root document. Selections for the contents of other tags will then be performed only on the document subset. If only XPath expressions are given, then the entire subtree of the matched elements are returned (including HTML)
force false Force create folders that do not exist. For example, if you do --path=/Plone/foo and foo does not exist, you will get an error message. Use --force to tell mr.importer to create it.
publish false Publish newly created content.
collapse false "collapse" content. (see collapse_parts() in mr.importer.py)
rename   Rename content. (see rename_parts() in mr.importer.py
replacetypes   Use custom types. (see replace_types())
match   Match files. (see match_files())
paths   Specify a series of locations on the filesystem, with corresponding locations in the database for imports, with syntax: --paths=import_dirs:object_paths (--path will be ignored)
create_spreadsheet false Create "spreadsheets". (see create_spreadsheet() in mr.importer.py)

Instead of accepting the default mr.importer behaviour, in your buildout.cfg file you may specify the following:

recipe = mr.importer
path = /Plone/foo
html_extensions = htm
image_extensions = png
target_tags = p

This will configure mr.importer to (only) import content from:

  • Images ending in .png
  • HTML files ending in .htm
  • Text within p tags


  • A folder named /Plone/foo.
Command line options

The following mr.importer command line options are supported.

'--path', '-p'

You can specify an alternate import path ('/Plone' by default) with --path or -p:

$ bin/plone run bin/import /path/to/files --path=/Plone/foo

You can specify HTML file extensions with the --html-extensions option:

$ bin/plone run bin/import /path/to/files --html-extensions=htm

You can specify image file extensions with the --image-extensions option:

$ bin/plone run bin/import /path/to/files --image-extensions=png

You can specify generic file extensions with the --file-extensions option:

$ bin/plone run bin/import /path/to/files --file-extensions=pdf

You can specify the target tags to parse with the --target-tags option:

$ bin/plone run bin/import /path/to/files --target-tags=p

Force create folders that do not exist.


Publish newly created content.


"collapse" content (see collapse_parts() in mr.importer.py).


Rename content (see rename_files()).


Customize types (see replace_types() in mr.importer.py).


Match files (see match_files() mr.importer.py).


You can specify a series of import paths and corresponding object paths:

$ bin/plone run bin/import --paths=sample:Plone/sample,sample2:Plone/sample2

You can optionally tell mr.importer to try and import the contents of any spreadsheets it finds, by doing this:

$ bin/plone run bin/import --create-spreadsheet /var/www/html

If /var/www/html/foo.xls exists and has content, then a http://localhost:8080/Plone/foo will be created as a page, with the contents of the spreadsheet in an HTML table.


And lastly, you can always ask mr.importer to tell you about its available options with the --help or -h option:

$ bin/plone run bin/import -h

Instead of accepting the default mr.importer behaviour, on the command line you may specify the following:

$ bin/plone run bin/import /path/to/files -p /Plone/foo --html-extensions=html \
    --image-extensions=png --target-tags=p

This will configure mr.importer to (only) import content from:

  • Images ending in .png
  • HTML files ending in .htm
  • Text within p tags


  • A Plone site folder named /Plone/foo.

Here are some trouble-shooting comments/tips.

Compiling lxml

mr.importer requires lxml which in turn requires libxml2 and libxslt. If you do not have lxml installed "globally" (i.e. in your system Python's site-packages directory) then Buildout will try to install it for you. At this point lxml will look for the libxml2/libxslt2 development libraries to build against, and if you don't have them installed on your system already your mileage may vary (i.e. Buildout will fail).

Database access

Before running mr.importer, you must either stop your Plone site or use ZEO. Otherwise mr.importer will not be able to access the database.


Questions, comments, or concerns? Please e-mail: aclark@aclark.net.


Development sponsored by Radio Free Asia

1.0a5 (02/05/2011)
  • Rename parse2plone to mr.importer
    • Repackage as needed
  • Switch to kwargs in main()
    • Better _SETTINGS handling
  • Add support for illegal_expressions check
  • Add "Keep going!" feature (to ignore errors)
  • Add all HTML4 tags to target_tags
1.0a4 (01/12/2011)
  • Remove Plone dep
1.0a3 (11/17/2010)
  • Bug fix: TypeError: join() takes exactly one argument (2 given) related to specifying import dir on on command line (as args[0]) fixed
  • Fix tests
1.0a2 (11/17/2010)
  • Add spreadsheet import feature
  • Fix docs
1.0a1 (11/17/2010)
  • Moved development to the (experimental) collective on Github
0.9.9 (11/16/2010)
  • Added a large number of tests; performed associated refactoring; 50% test coverage
0.9.8 (11/12/2010)
  • Add "paths" feature to allow multi-import dirs (on the file system), and corresponding object paths (in Plone) to be specified.
0.9.7 (11/08/2010)
  • Fix import error
  • Add file handler to logger; saves output to a file called "parse2plone.log"
0.9.6 (11/08/2010)
  • Fixes to "match" feature
  • Combine all modules into one
  • Remove a stray pdb (!)
  • Add tests (we're at 20% test coverage people!)
  • Update docs
0.9.5 (11/08/2010)
  • Add match feature
  • Add more project justifications to the docs
0.9.4 (11/06/2010)
  • Remove bin/import script whenever recipe is uninstalled [aclark4life]
  • Add support for XPath syntax in target_tags [derek]
  • Add "typeswap" feature [aclark4life]
  • Update docs [aclark4life]
0.9.3 (11/04/2010)
  • Add Plone 2.5 compat
  • Bug fixes
    • Better handling of file system path; better base dir calculation
0.9.2 (11/03/2010)
  • More doc fixes
0.9.1 (11/03/2010)
  • Doc fixes
0.9.0 (11/03/2010)
  • Fix regressions introduced (or unresolved as of) 0.8.2. Thanks Derek Broughton for the bug report(s)
    • Many fixes to convert_parameter_values() method which converts recipe parameters to arguments passed to main()
    • Fix "slugify" feature
0.8.2 (11/02/2010)
  • Add rename feature
  • Fix regressions introduced in 0.8.1
0.8.1 (10/29/2010)
  • Refactor options/parameters functionality to universally support _SETTINGS dict
  • Add "slugify" feature
  • Doc fixes
  • Add support to optionally publish content after creation
  • Add support for generic file import
0.8 (10/27/2010)
  • Support the importing of content to folders within the Plone site object
0.7 (10/25/2010)
  • Documentation fixes
0.6 (10/25/2010)
  • Support customization via recipe parameters and command line arguments
0.5 (10/22/2010)
  • Revert 'Add Plone to install_requires'
0.4 (10/22/2010)
  • Add 'Plone' to install_requires
0.3 (10/22/2010)
  • Another setuptools fix
0.2 (10/22/2010)
  • Setuptools fix
0.1 (10/21/2010)
  • Initial release

Subscribe to package updates

Last updated Feb 6th, 2011

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.