Welcome, guest | Sign In | My Account | Store | Cart

Notice! PyPM is being replaced with the ActiveState Platform, which enhances PyPM’s build and deploy capabilities. Create your free Platform account to download ActivePython or customize Python with the packages you require and get automatic updates.

Download
ActivePython
INSTALL>
pypm install pbot

How to install pbot

  1. Download and install ActivePython
  2. Open Command Prompt
  3. Type pypm install pbot
 Python 2.7Python 3.2Python 3.3
Windows (32-bit)
1.4.0 Available View build log
1.3.0 Available View build log
1.1.0 Available View build log
Windows (64-bit)
1.4.0 Available View build log
1.3.0 Available View build log
1.1.0 Available View build log
Mac OS X (10.5+)
1.4.0 Available View build log
1.3.0 Available View build log
1.1.0 Available View build log
Linux (32-bit)
1.4.0 Available View build log
1.3.0 Available View build log
1.1.0 Available View build log
Linux (64-bit)
1.4.0 Available View build log
1.3.0 Available View build log
1.1.0 Available View build log
 
Author
License
GPL
Dependencies
Imports
Lastest release
version 1.4.0 on Mar 2nd, 2011

Pbot contains two modules, Bot and Spider

Bot is a simple helper, created to save request state (cookies, referrer) between http requests. Also, it provides addional methods for adding cookies. With no dependencies this module is easy to use when you need to simulate browser.

Spider it's pbot, armed by lxml (required). Provides addional methods for easy website crawling, see below.

Bot is very easy to use:

System Message: WARNING/2 (<string>, line 11)

Literal block expected; none found.

from pbot.pbot import Bot bot = Bot(proxies={'http': 'localhost:3128'}) # You can provide proxies, during bot creation, or set later as bot.proxies bot.add_cookie({'name': 'sample', 'value': 1, 'domain': 'example.com'}) response = bot.open('http://example.com') # Open with cookies and empty referrer bot.follow('http://google.com') # Open google with example.com as a referrer response = bot.response # Response saved, and can be read later bot.follow('http://example.com', post={'q': 'abc'}) # You can provide post and get as keyword arguments bot.refresh_connector() # Flush cookies and referrer

Spider gives you special features:

System Message: WARNING/2 (<string>, line 24)

Literal block expected; none found.

from pbot.spider import Spider bot = Spider() # or Spider(force_encoding='utf-8') to force encoding for parser bot.open('http://example.com') bot.tree.xpath('//a') # lxml tree can be accessed by .tree, response will be automatically readed and parsed by lxml.html form = bot.xpath('//form[@id="main"]') # xpath shortcut for bot.tree.xpath bot.submit(form) # Submit lxml f§orm # # Crawler, recursively crawl from target page yielding xml_tree, query_url, real_url (real_url - url after all redirects). bot.crawl(self, url=None, # Target url to start crawling check_base=True, # Yield pages only on domain from url only_descendant=True, # Yield only pages that urls starts with url max_level=None, #Maximum level allowed_protocols=('http:', 'https:'), ignore_errors=True, ignore_starts=(), # Tuple/array, ignore urls that starts with ignore_starts (exclude some parts of site) check_mime=())

Subscribe to package updates

Last updated Mar 2nd, 2011

Download Stats

Last month:1

What does the lock icon mean?

Builds marked with a lock icon are only available via PyPM to users with a current ActivePython Business Edition subscription.

Need custom builds or support?

ActivePython Enterprise Edition guarantees priority access to technical support, indemnification, expert consulting and quality-assured language builds.

Plan on re-distributing ActivePython?

Get re-distribution rights and eliminate legal risks with ActivePython OEM Edition.