Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install reductio

How to install reductio

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install reductio
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
Windows (64-bit)
Mac OS X (10.5+)
Linux (32-bit)
Linux (64-bit)
 
Author
License
AGPLv3
Dependencies
Imports
Lastest release
version 0.2.1 on Dec 6th, 2011

Because mapping and reducing isn't supposed to be hard.

What is reductio?

Reductio is a minimalistic map-reduce framework for Python. It runs on top of Fabric and setuptools, which you might already use to get your code onto other machines.

It has no database. It has no distributed filesystem. It uses no server other than sshd. Because of this, it has essentially no memory requirement!

Reductio is designed for disk-bound big data tasks, which many of them are. If the pieces you need to map-reduce fit entirely in the RAM of your worker computers, you are paying a huge premium for that. And if they don't, a system that tries to buffer things in RAM is going to be wasting all its effort. At some point, you won't see your data again unless you write it to the damn disk, and that's what is going to take most of the time.

I created reductio out of absolute necessity, so I could start crunching my data. You might notice its documentation is practically nonexistent at the moment. What Reductio does ------------------ Reductio extends Fabric (http://fabfile.org). It is meant to support the following approximate process:

  • Set up the appropriate Python environment on all your worker machines,

System Message: WARNING/2 (<string>, line 30)

Bullet list ends without a blank line; unexpected unindent.

including required packages, a place to keep the data, and an up-to-date version of the code you want to run. - Tell each worker machine how to contact all the other machines, so they can send their data onward ("scatter") when they're done with it. - Give each worker machine some Python functions to run over all the data. These will often be "maps" or "reduces". - Collect results and group the things that belong together using Unix's extremely optimized sort command.

Example

reductio/example/fabfile.py is in example of how to count all the letter bigrams (pairs of adjacent letters) in the ubiquitous "words" file, and aggregate them into a table of frequencies.

Running tasks

If you have a task defined as the function do_stuff in mymodule.py, but first you want to run setup to make sure code and other things are up to date, you would run them both with this command:

fab -f mymodule setup do_stuff

Why not Hadoop?

Face it, if you knew how to configure Hadoop and get your code to run with it, you'd be doing so right now.

Also, Hadoop is written for Java programmers, and Python is distinctly a second-class citizen in its world. Hadoop seems to think all Python code takes the form of standalone scripts with no dependencies, which perhaps says something about what Java programmers think of Python.

Reductio recognizes that none of this is going to work unless you have the right Python setup, so it builds on tools that Python programmers already use to deploy their code.

Why not Disco?

I approve of the Disco project (http://discoproject.org) and its goal of creating a map-reduce ecosystem designed around Python, but I find it too complex and too "magical" at the moment.

It makes it difficult to understand what is going on in its internals, and yet you have to understand its internals when something goes wrong or when you want to do something the designers didn't expect.

Subscribe to package updates

Last updated Dec 6th, 2011

Download Stats

Last month:3

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.