Welcome, guest | Sign In | My Account | Store | Cart

This CGI script allows you to specify a URL. It fetches the URL and displays all the headers sent by the server. It is based on approx.py the CGI-proxy I'm building. It includes authentication circuitry and I'm using it to understand http authentication.

This script demostrates using urllib2 to fetch a URL - using a request object with User-Agent header. It also demostrates basic authentication and shows the possible http errors - using a dictionary 'borrowed' from BaseHTTPServer.

It will also save cookies using the ClientCookie module, if it's available.

Python, 267 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
#!/usr/bin/python -u
# 18-08-04
# v1.1.1

# http.py
# A simple CGI script, to explore http headers, cookies etc.

# Copyright Michael Foord
# Free to use, modify and relicense.
# No warranty express or implied for the accuracy, fitness to purpose or otherwise for this code....
# Use at your own risk !!!

# E-mail or michael AT foord DOT me DOT uk
# Maintained at www.voidspace.org.uk/atlantibots/pythonutils.html

"""
This CGI script allows you to specify a URL using an HTML form.
It will fetch the specified URL and print the headers from the server.
It will also handle cookies using ClientCookie - if it's available.

It is based on approx.py the CGI-proxy I'm building.
It includes authentication circuitry and I'm using it to understand http authentication.

This script shows using urllib2 to fetch a URL with a request object including User-Agent header and basic authentication.
It also shows the possible http errors - using a dictionary 'borrowed' from BaseHTTPServer
"""

################################################################
# Imports

try:
    import cgitb; cgitb.enable()
except:
    pass
import os, sys, cgi, pickle
from time import strftime
import urllib2

sys.stderr = sys.stdout

READSIZE = 4000
COOKIEFILE = 'cookies.lwp'

try:
    import ClientCookie
    openfun = ClientCookie.urlopen
    reqfun = ClientCookie.Request
    cj = ClientCookie.LWPCookieJar()
    if os.path.isfile(COOKIEFILE):
        cj.load(COOKIEFILE)
    opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
    ClientCookie.install_opener(opener)
except:
    ClientCookie = None
    openfun = urllib2.urlopen
    reqfun = urllib2.Request


###############################################################
# Nicked from BaseHTTPServer
# This is the basic table of HTTP errors

errorlist = {   400: ('Bad Request',
                      'The Server thinks your request was malformed.'),
         401: ('Unauthorized',
              'No permission -- see authorization schemes'),
        402: ('Payment required',
              'No payment -- see charging schemes'),
        403: ('Forbidden',
              'Request forbidden -- authorization will not help'),
        404: ('Not Found', 'Nothing matches the given URI'),
        405: ('Method Not Allowed',
              'Specified method is invalid for this server.'),
        406: ('Not Acceptable', 'URI not available in preferred format.'),
        407: ('Proxy Authentication Required', 'You must authenticate with '
              'this proxy before proceeding.'),
        408: ('Request Time-out', 'Request timed out; try again later.'),
        409: ('Conflict', 'Request conflict.'),
        410: ('Gone',
              'URI no longer exists and has been permanently removed.'),
        411: ('Length Required', 'Client must specify Content-Length.'),
        412: ('Precondition Failed', 'Precondition in headers is false.'),
        413: ('Request Entity Too Large', 'Entity is too large.'),
        414: ('Request-URI Too Long', 'URI is too long.'),
        415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
        416: ('Requested Range Not Satisfiable',
              'Cannot satisfy request range.'),
        417: ('Expectation Failed',
              'Expect condition could not be satisfied.'),

        500: ('Internal error', 'Server got itself in trouble'),
        501: ('Not Implemented',
              'Server does not support this operation'),
        502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
        503: ('Service temporarily overloaded',
              'The server cannot process the request due to a high load'),
        504: ('Gateway timeout',
              'The gateway server did not receive a timely response'),
        505: ('HTTP Version not supported', 'Cannot fulfill request.')
                      }
################################################################
# Private functions and variables

SCRIPTNAME = os.environ.get('SCRIPT_NAME', '')                        # the name of the script
versionstring = '1.1.1 18th August, 2004.'
fontline = '<FONT COLOR=#424242 style="font-family:times;font-size:12pt;">'

METHOD = 'GET'
METHOD2 = 'POST'

def getform(valuelist, theform, notpresent=''):
    """This function, given a CGI form, extracts the data from it, based on
    valuelist passed in. Any non-present values are set to '' - although this can be changed.
    (e.g. to return None so you can test for missing keywords - where '' is a valid answer but to have the field missing isn't.)"""
    data = {}
    for field in valuelist:
        if not theform.has_key(field):
            data[field] = notpresent
        else:
            if  type(theform[field]) != type([]):
                data[field] = theform[field].value
            else:
                values = map(lambda x: x.value, theform[field])     # allows for list type values
                data[field] = values
    return data

errormess = "<H1>An Error Has Occurred</H1><BR><B><PRE>"

theformhead = """<HTML><HEAD><TITLE>http.py - Playing With Headers and Cookies</TITLE></HEAD>
<BODY><CENTER>
<H1>Welcome to http.py - <BR>a Python CGI</H1>
<B><I>By Fuzzyman</B></I><BR>
"""+fontline +"Version : " + versionstring + """, Running on : """ + strftime('%I:%M %p, %A %d %B, %Y')+'''.</CENTER>
<BR>'''

HR = '<BR><BR><HR><BR><BR>'

theform = """This CGI script allows you to specify a URL using the form below.<BR>
It will take a look at the specified URL and print the headers from the server.<BR>
It will also print the cookies which ought to be managed by the ClientCookie module.<BR>
<BR>
<H2>Enter the Location</H2>
<FORM METHOD=\"""" + METHOD + '" action="' + SCRIPTNAME + """\">
<input name=url type=text size=45 value=\"%s\" ><BR>
<input type=submit value="Submit"><BR>
</FORM>
<BR><BR><HR><BR><A href="http://www.voidspace.org.uk/atlantibots/pythonutils.html">Voidspace Pythonutils Page</A>
</BODY>
</HTML>
"""

authmess = """<HTML><HEAD><TITLE>Authentication Required</TITLE></HEAD>
<BODY><CENTER>
<H1>Authentication Required</H1>
<B><I>http.py By Fuzzyman</B></I><BR>
"""+fontline +"Version : " + versionstring + """, Running on : """ + strftime('%I:%M %p, %A %d %B, %Y')+'''.</CENTER><BR>
<BR>Please enter your username and password below.<BR>
<FORM METHOD=\"''' + METHOD2 + '" action="' + SCRIPTNAME + """\">Username : 
<input name="name" type=text><BR>Password : 
<input name="pass" type=password><BR>
<input type=hidden value="%s" name="theurl">
<input type=submit value="Submit">
<BR><BR>
"""


err_mess = """<HTML><HEAD><TITLE>%s</TITLE></HEAD>
<BODY><CENTER>
<H1>%s</H1>
<H2>%s</H2>
</CENTER>"""

################################################################
# main body of the script

if __name__ == '__main__':
    print "Content-type: text/html"         # this is the header to the server
    print                                   # so is this blank line
    form = cgi.FieldStorage()           
    data = getform(['url', 'name', 'pass', 'theurl'], form)
    print theformhead
    theurl = data['theurl'] or data['url']
    if not SCRIPTNAME: theurl = 'http://www.google.com/search?hl=en&ie=UTF-8&q=hello&btnG=Google+Search'
    info = 'An error occured before we got the headers.'
    e = ''
    if not theurl: 
        print theform % ''
    else:
        if theurl.find(':') == -1: theurl = 'http://' + theurl
        try:
            req = reqfun(theurl, None, {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'})
            if data['name'] and data['pass']:
                import base64
                base64string = base64.encodestring('%s:%s' % (data['name'], data['pass']))[:-1]
                req.add_header("Authorization", "Basic %s" % base64string)
            u = openfun(req)
            info = u.info()
        except Exception, e:                      # an error in fetching the page
            if not hasattr(e, 'code'):                  # Means the page doesn't exist 
                the_err = errorlist[404]
                print err_mess % (the_err[0], the_err[0], the_err[1])
                
            elif e.code  == 401:                        # authentication
                print authmess % (theurl)

            elif e.code in errorlist:                   # standard http errors
                the_err = errorlist[e.code]
                print err_mess % (the_err[0], the_err[0], the_err[1])
                
            else:                                       # any others (unknown error - shouldn't happen)
                raise         

        print HR
        print '<PRE>'
        print info
        print
        if e:                               # If an error has occurred - this ought to show the details
            print 'The Error : '
            print e
            print '\nAttributes of the python error object :'
            print dir(e)
            if hasattr(e, 'code'):
                print '\nThe Headers :' 
                print e.headers

        if ClientCookie:
            print
            print 'Cookies :'
            a = 0
            for c in cj:
                a += 1
                print a, c.__repr__()
        else:
            print
            print "ClientCookie isn't installed - so cookie stuff don't work !"

        print
        print 'Content (first', READSIZE, 'bytes) :'
        print u.read(READSIZE).replace('<', '&lt;')
        print '</PRE>'
        print HR
        print theform % theurl

        
        if ClientCookie:
            cj.save(COOKIEFILE)
            


            
"""
TODO/ISSUES
Work out what a realm is !


CHANGELOG
18-08-04        Version 1.1.0
Won't crash if ClientCookie isn't available.

12-08-04        Version 1.1.0
Added support for ClientCookie
Now displays the first 4000 bytes of content too.
My birthday.

02-08-04        Version 1.0.0
My first wedding anniversary.
"""

I'm exploring the issue of http authentication - basic and digest etc This script will fetch a URL and display all the headers from the server. If you get an error 401 it will still display the headers - but let you authenticate.

This demonstrates various basic aspects of CGI and fetching URLs. You can use it/see it in operation at : http://www.voidspace.xennos.com/index.htm#http

This is now updated to save cookies if the ClientCookie module is available.

1 comment

Michael Foord (author) 17 years, 2 months ago  # | flag

Updated to Include Proper Authentication. The version saved over at voidspace includes my updated take on authentication - which saves the username/password temporarily and will fetch another page and automatically authenticate. You can download the updated version or try the online one. I'm not putting the changes here as this is already too big to be called a recipe....

http://www.voidspace.org.uk/atlantibots/recipebook.html#http