Convert string to hex (Python)

2006-08-18T05:17:53-07:00

Python recipe 496969 by Mykola Kharechko (web).

Qoute string converting each char to hex repr and back

Http client to POST using multipart/form-data (Python)

2002-08-23T07:56:39-07:00

Python recipe 146306 by Wade Leftwich (web).

A scripted web client that will post data to a site as if from a form using ENCTYPE="multipart/form-data". This is typically used to upload files, but also gets around a server's (e.g. ASP's) limitation on the amount of data that can be accepted via a standard POST (application/x-www-form-urlencoded).

Simple HTTP server supporting SSL secure communications (Python)

2008-08-02T16:04:56-07:00

Python recipe 442473 by Sebastien Martini (https, openssl, ssl, web). Revision 8.

This recipe describes how to set up a simple HTTP server supporting SSL secure communications. It extends the SimpleHTTPServer standard module to support the SSL protocol. With this recipe, only the server is authenticated while the client remains unauthenticated (i.e. the server will not request a client certificate). Thus, the client (typically the browser) will be able to verify the server identity and secure its communications with the server.

This recipe requires you already know the basis of SSL and how to set up OpenSSL. This recipe is mostly derived from the examples provided with the pyOpenSSL package.

In order to apply this recipe, follow these few steps:

Install the OpenSSL package in order to generate key and certificate. Note: you probably already have this package installed if you are under Linux, or *BSD.
Install the pyOpenSSL package, it is an OpenSSL library binding. You'll need to import this module for accessing OpenSSL's components.
Generate a self-signed certificate compounded of a certificate and a private key for your server with the following command (it outputs them both in a single file named server.pem): openssl req -new -x509 -keyout server.pem -out server.pem -days 365 -nodes
Assuming you saved this recipe in SimpleSecureHTTPServer.py, start the server (with the appropriate rights): python SimpleSecureHTTPServer.py
Finally, browse to https://localhost, or https://localhost:port if your server listens a different port than 443.

Minimal http upload cgi (Python)

2004-03-20T01:47:04-08:00

Python recipe 273844 by Noah Spurrier (web). Revision 3.

This is a bare-bones cgi file upload. It will display an upload form and save the uploaded files to disk.

Simple Web Crawler (Python)

2011-01-31T21:57:58-08:00

Python recipe 576551 by James Mills (crawler, network, parsing, web). Revision 2.

NOTE: This recipe has been updated with suggested improvements since the last revision.

This is a simple web crawler I wrote to test websites and links. It will traverse all links found to any given depth.

See --help for usage.

I'm posting this recipe as this kind of problem has been asked on the Python Mailing List a number of times... I thought I'd share my simple little implementation based on the standard library and BeautifulSoup.

--JamesMills

HTTP basic authentication (Python)

2004-10-05T14:05:47-07:00

Python recipe 305288 by Michael Foord (web). Revision 3.

A script demonstrating how to manually do basic authentication over http.

A simple XML-RPC server (Python)

2001-10-13T11:34:19-07:00

Python recipe 81549 by Brian Quinlan (web).

This recipe demonstrates the creation of a simple XML-RPC server using the SimpleXMLRPCServer class. It requires either Python 2.2 or later or the XML-RPC package from PythonWare (http://www.pythonware.com/products/xmlrpc/index.htm) to run.

HTMLTags - generate HTML in Python (Python)

2009-10-24T10:30:38-07:00

Python recipe 366000 by Pierre Quentel (web). Revision 11.

The HTMLTags module defines a class for each valid HTML tag, written in uppercase letters. To create a piece of HTML, the general syntax is :

t = TAG(innerHTML, key1=val1,key2=val2,...)

so that "print t" results in :

<TAG key1="val1" key2="val2" ...>innerHTML</TAG>

For instance :

print A('bar', href="foo") ==> <A href="foo">bar</A>

E-mail Address Validation (Python)

2001-07-27T13:37:26-07:00

Python recipe 65215 by Mark Nenadov (web). Revision 5.

This function simply validates an e-mail address. Ignore this recepie and go to my "StringValidator" recepie, which is a much better solution

SSL Client Authentication over HTTPS (Python)

2002-02-28T13:43:27-08:00

Python recipe 117004 by Rob Riggs (web).

A 16-line python application that demonstrates SSL client authentication over HTTPS. We also explain the basics of how to set up Apache to require SSL client authentication. This assumes at least Python-2.2 compiled with SSL support, and Apache with mod_ssl.

Calculating the distance between zip codes (Python)

2006-04-25T20:40:00-07:00

Python recipe 393241 by Kevin Ryan (web). Revision 2.

I came across the mention of a formula labeled "The Great Circle Distance Formula" that purported to calculate the distance between any two points on the earth given their longitude and latitude points (the reference was in a Linux Magazine article). So, I looked up some information and cooked up a Python version of the calculation. There are references in the code where you can obtain approximate zip code data for free (e.g., if you wanted to enhance your website by adding a "Search within x mi's" feature), as well as references to the GCDF if you have further interest. Enjoy!

04/25/2006 update: I've decided to update this recipe with an object oriented bent where the information is cached once the object is instantiated. I've also added command line access to automatically download the zipcode file from the census website (just use 'python zips.py -d' and it will download a copy to your harddrive under 'zips.txt'). Lastly, I've added some unit testing so that if any future changes are made this will automatically run and tell me if anything pops out as wrong.

Simple XML RPC server over HTTPS (Python)

2006-06-07T07:17:34-07:00

Python recipe 496786 by Laszlo Nagy (web).

Simple program that demonstrates how to write an XMLRCP server that uses https for transporting XML data.

cookielib Example (Python)

2004-12-28T11:26:41-08:00

Python recipe 302930 by Michael Foord (web). Revision 2.

cookielib is a library new to Python 2.4 Prior to Python 2.4 it existed as ClientCookie, but it's not a drop in replacement - some of the function of ClientCookie has been moved into urllib2.

This example shows code for fetching URIs (with cookie handling - including loading and saving) that will work unchanged on : a machine with python 2.4 (and cookielib) a machine with ClientCookie installed a machine with neither (Obviously on the machine with neither the cookies won't be handled or saved).

Where either cookielib or ClientCookie is available the cookies will be saved in a file. If that file exists already the cookies will first be loaded from it. The file format is a useful plain text format and the attributes of each cookie is accessible in the Cookiejar instance (once loaded).

This may be helpful to those just using ClientCookie as the ClientCookie documentation doesn't appear to document the LWPCookieJar class which is needed for saving and loading cookies.

Python FTP Client (Python)

2007-06-21T12:13:54-07:00

Python recipe 521925 by N S (web).

This is a lightweight FTP client. I find it useful for my purposes. You may notice some weird code, but I assure you, it is legitimate. Python was being stubborn, so I had to circumvent some of the rules.

Simple HTTP server based on asyncore/asynchat (Python)

2005-10-16T08:26:59-07:00

Python recipe 259148 by Pierre Quentel (web). Revision 8.

A simple HTTP Server, intended to be as simple as the standard module SimpleHTTPServer, built upon the asyncore/asynchat modules (uses non-blocking sockets). Provides a Server (copied from medusa http_server) and a RequestHandler class. RequestHandler handles both GET and POST methods and inherits SimpleHTTPServer.SimpleHTTPRequestHandler

It can be easily extended by overriding the handle_data() method in the RequestHandler class

My first application server (Python)

2009-02-23T11:53:57-08:00

Python recipe 392879 by Pierre Quentel (web). Revision 8.

For those who want to start dynamic web programming, but don't know what to choose among the many Python web frameworks, this program might be a good starting point

ScriptServer is a minimalist application server, handling both GET and POST requests, including multipart/form-data for file uploads, HTTP redirections, and with an in-memory session management. It can run Python scripts and template files using the standard string substitution format

The scripts are run in the same process as the server, avoiding the CGI overhead. The environment variables are provided in the namespace where the script runs

To start the server, run

python ScriptServer.py

In your web browser, enter http://localhost, this will show you a listing of the directory. Add the scripts in the same directory as ScriptServer

Simple AJAX with javascript JSON parser (Python)

2005-10-14T06:58:34-07:00

Python recipe 440637 by Wensheng Wang (web). Revision 2.

This JSON parser works well with stringified Python list or dictionary. It is from json.org javacript json parser with small modification.

HTML Scraper (Python)

2004-09-06T08:18:49-07:00

Python recipe 286269 by Michael Foord (web). Revision 5.

A simple HTML 'parser' that will 'read' through an HTML file and call functions on data and tags etc. Useful if you need to implement a straightforward parser that just extracts information from the file or modifies tags etc.

Shouldn't choke on bad HTML.

A Simple Webcrawler (Python)

2012-03-03T02:37:30-08:00

Python recipe 578060 by John (crawler, html, page, parser, scraping, urllib, urlopen, web).

This is my simple web crawler. It takes as input a list of seed pages (web urls) and 'scrapes' each page of all its absolute path links (i.e. links in the format http://) and adds those to a dictionary. The web crawler can take all the links found in the seed pages and then scrape those as well. You can continue scraping as deep as you like. You can control how "deep you go" by specifying the depth variable passed into the WebCrawler class function start_crawling(seed_pages,depth). Think of the depth as the recursion depth (or the number of web pages deep you go before returning back up the tree).

To make this web crawler a little more interesting I added some bells and whistles. I added the ability to pass into the WebCrawler class constructor a regular expression object. The regular expression object is used to "filter" the links found during scraping. For example, in the code below you will see:

cnn_url_regex = re.compile('(?<=[.]cnn)[.]com') # cnn_url_regex is a regular expression object

w = WebCrawler(cnn_url_regex)

This particular regular expression says:

1) Find the first occurence of the string '.com'

2) Then looking backwards from where '.com' was found it attempts to find '.cnn'

Why do this?

You can control where the crawler crawls. In this case I am constraining the crawler to operate on webpages within cnn.com.

Another feature I added was the ability to parse a given page looking for specific html tags. I chose as an example the <h1> tag. Once a <h1> tag is found I store all the words I find in the tag in a dictionary that gets associated with the page url.

Why do this?

My thought was that if I scraped the page for text I could eventually use this data for a search engine request. Say I searched for 'Lebron James'. And suppose that one of the pages my crawler scraped found an article that mentions Lebron James many times. In response to a search request I could return the link with the Lebron James article in it.

The web crawler is described in the WebCrawler class. It has 2 functions the user should call:

1) start_crawling(seed_pages,depth)

2) print_all_page_text() # this is only used for debug purposes

The rest of WebCrawler's functions are internal functions that should not be called by the user (think private in C++).

Upon construction of a WebCrawler object, it creates a MyHTMLParser object. The MyHTMLParser class inherits from the built-in Python class HTMLParser. I use the MyHTMLParser object when searching for the <h1> tag. The MyHTMLParser class creates instances of a helper class named Tag. The tag class is used in creating a "linked list" of tags.

So to get started with WebCrawler make sure to use Python 2.7.2. Enter the code a piece at a time into IDLE in the order displayed below. This ensures that you import libs before you start using them.

Once you have entered all the code into IDLE, you can start crawling the 'interwebs' by entering the following:

import re

cnn_url_regex = re.compile('(?<=[.]cnn)[.]com')

w = WebCrawler(cnn_url_regex)

w.start_crawling(['http://www.cnn.com/2012/02/24/world/americas/haiti-pm-resigns/index.html?hpt=hp_t3'],1)

Of course you can enter any page you want. But the regular expression object is already setup to filter on cnn.com. Remember the second parameter passed into the start_crawling function is the recursion depth.

Happy Crawling!

Composing a POSTable HTTP request with multipart/form-data Content-Type to simulate a form/file upload. (Python)

2014-03-08T17:34:38-08:00

Python recipe 578846 by István Pásztor (field, file, form, html, httpclient, mime, multipart, post, upload, web). Revision 5.

This code is useful if you are using a http client and you want to simulate a request similar to that of a browser that submits a form containing several input fields (including file upload fields). I've used this with python 2.x.

Most viewed recipes tagged "web"

Convert string to hex (Python)

Http client to POST using multipart/form-data (Python)

Simple HTTP server supporting SSL secure communications (Python)

In order to apply this recipe, follow these few steps:

Minimal http upload cgi (Python)

Simple Web Crawler (Python)

HTTP basic authentication (Python)

A simple XML-RPC server (Python)

HTMLTags - generate HTML in Python (Python)

E-mail Address Validation (Python)

SSL Client Authentication over HTTPS (Python)

Calculating the distance between zip codes (Python)

Simple XML RPC server over HTTPS (Python)

cookielib Example (Python)

Python FTP Client (Python)

Simple HTTP server based on asyncore/asynchat (Python)

My first application server (Python)

Simple AJAX with javascript JSON parser (Python)

HTML Scraper (Python)

A Simple Webcrawler (Python)

Composing a POSTable HTTP request with multipart/form-data Content-Type to simulate a form/file upload. (Python)