This CGI script allows you to specify a URL. It fetches the URL and displays all the headers sent by the server. It is based on the CGI-proxy I'm building. It includes authentication circuitry and I'm using it to understand http authentication.
This script demostrates using urllib2 to fetch a URL - using a request object with User-Agent header. It also demostrates basic authentication and shows the possible http errors - using a dictionary 'borrowed' from BaseHTTPServer.
It will also save cookies using the ClientCookie module, if it's available.
# 18-08-04
# v1.1.1
# A simple CGI script, to explore http headers, cookies etc.
# Copyright Michael Foord
# Free to use, modify and relicense.
# No warranty express or implied for the accuracy, fitness to purpose or otherwise for this code....
# Use at your own risk !!!
# E-mail or michael AT foord DOT me DOT uk
# Maintained at
# Imports
import cgitb; cgitb.enable()
import os, sys, cgi, pickle
from time import strftime
import urllib2
sys.stderr = sys.stdout
COOKIEFILE = 'cookies.lwp'
import ClientCookie
openfun = ClientCookie.urlopen
reqfun = ClientCookie.Request
cj = ClientCookie.LWPCookieJar()
if os.path.isfile(COOKIEFILE):
opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
ClientCookie = None
openfun = urllib2.urlopen
reqfun = urllib2.Request
# Nicked from BaseHTTPServer
# This is the basic table of HTTP errors
errorlist = { 400: ('Bad Request',
'The Server thinks your request was malformed.'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this server.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Time-out', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),
500: ('Internal error', 'Server got itself in trouble'),
501: ('Not Implemented',
'Server does not support this operation'),
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
503: ('Service temporarily overloaded',
'The server cannot process the request due to a high load'),
504: ('Gateway timeout',
'The gateway server did not receive a timely response'),
505: ('HTTP Version not supported', 'Cannot fulfill request.')
# Private functions and variables
SCRIPTNAME = os.environ.get('SCRIPT_NAME', '') # the name of the script
versionstring = '1.1.1 18th August, 2004.'
fontline = '<FONT COLOR=#424242 style="font-family:times;font-size:12pt;">'
def getform(valuelist, theform, notpresent=''):
"""This function, given a CGI form, extracts the data from it, based on
valuelist passed in. Any non-present values are set to '' - although this can be changed.
(e.g. to return None so you can test for missing keywords - where '' is a valid answer but to have the field missing isn't.)"""
data = {}
for field in valuelist:
if not theform.has_key(field):
data[field] = notpresent
if type(theform[field]) != type([]):
data[field] = theform[field].value
values = map(lambda x: x.value, theform[field]) # allows for list type values
data[field] = values
return data
errormess = "<H1>An Error Has Occurred</H1><BR><B><PRE>"
theformhead = """<HTML><HEAD><TITLE> - Playing With Headers and Cookies</TITLE></HEAD>
<H1>Welcome to - <BR>a Python CGI</H1>
<B><I>By Fuzzyman</B></I><BR>
"""+fontline +"Version : " + versionstring + """, Running on : """ + strftime('%I:%M %p, %A %d %B, %Y')+'''.</CENTER>
HR = '<BR><BR><HR><BR><BR>'
theform = """This CGI script allows you to specify a URL using the form below.<BR>
It will take a look at the specified URL and print the headers from the server.<BR>
It will also print the cookies which ought to be managed by the ClientCookie module.<BR>
<H2>Enter the Location</H2>
<FORM METHOD=\"""" + METHOD + '" action="' + SCRIPTNAME + """\">
<input name=url type=text size=45 value=\"%s\" ><BR>
<input type=submit value="Submit"><BR>
<BR><BR><HR><BR><A href="">Voidspace Pythonutils Page</A>
authmess = """<HTML><HEAD><TITLE>Authentication Required</TITLE></HEAD>
<H1>Authentication Required</H1>
<B><I> By Fuzzyman</B></I><BR>
"""+fontline +"Version : " + versionstring + """, Running on : """ + strftime('%I:%M %p, %A %d %B, %Y')+'''.</CENTER><BR>
<BR>Please enter your username and password below.<BR>
<FORM METHOD=\"''' + METHOD2 + '" action="' + SCRIPTNAME + """\">Username :
<input name="name" type=text><BR>Password :
<input name="pass" type=password><BR>
<input type=hidden value="%s" name="theurl">
<input type=submit value="Submit">
err_mess = """<HTML><HEAD><TITLE>%s</TITLE></HEAD>
# main body of the script
if __name__ == '__main__':
print "Content-type: text/html" # this is the header to the server
print # so is this blank line
form = cgi.FieldStorage()
data = getform(['url', 'name', 'pass', 'theurl'], form)
print theformhead
theurl = data['theurl'] or data['url']
if not SCRIPTNAME: theurl = ''
info = 'An error occured before we got the headers.'
e = ''
if not theurl:
print theform % ''
if theurl.find(':') == -1: theurl = 'http://' + theurl
req = reqfun(theurl, None, {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'})
if data['name'] and data['pass']:
import base64
base64string = base64.encodestring('%s:%s' % (data['name'], data['pass']))[:-1]
req.add_header("Authorization", "Basic %s" % base64string)
u = openfun(req)
info =
except Exception, e: # an error in fetching the page
if not hasattr(e, 'code'): # Means the page doesn't exist
the_err = errorlist[404]
print err_mess % (the_err[0], the_err[0], the_err[1])
elif e.code == 401: # authentication
print authmess % (theurl)
elif e.code in errorlist: # standard http errors
the_err = errorlist[e.code]
print err_mess % (the_err[0], the_err[0], the_err[1])
else: # any others (unknown error - shouldn't happen)
print HR
print '<PRE>'
print info
if e: # If an error has occurred - this ought to show the details
print 'The Error : '
print e
print '\nAttributes of the python error object :'
print dir(e)
if hasattr(e, 'code'):
print '\nThe Headers :'
print e.headers
if ClientCookie:
print 'Cookies :'
a = 0
for c in cj:
a += 1
print a, c.__repr__()
print "ClientCookie isn't installed - so cookie stuff don't work !"
print 'Content (first', READSIZE, 'bytes) :'
print'<', '<')
print '</PRE>'
print HR
print theform % theurl
if ClientCookie:
Work out what a realm is !
18-08-04 Version 1.1.0
Won't crash if ClientCookie isn't available.
12-08-04 Version 1.1.0
Added support for ClientCookie
Now displays the first 4000 bytes of content too.
My birthday.
02-08-04 Version 1.0.0
My first wedding anniversary.
I'm exploring the issue of http authentication - basic and digest etc This script will fetch a URL and display all the headers from the server. If you get an error 401 it will still display the headers - but let you authenticate.
This demonstrates various basic aspects of CGI and fetching URLs. You can use it/see it in operation at :
This is now updated to save cookies if the ClientCookie module is available.
Updated to Include Proper Authentication. The version saved over at voidspace includes my updated take on authentication - which saves the username/password temporarily and will fetch another page and automatically authenticate. You can download the updated version or try the online one. I'm not putting the changes here as this is already too big to be called a recipe....