Welcome, guest | Sign In | My Account | Store | Cart

Say you need to make sure the HTTP headers to and from the server are right? Or you just want to track them like using Firefox and LiveHTTPHeaders. Use this custom processor to watch them. Note that I used the ClientCookie package, but this should work with urllib2 without ClientCookie. It should also be adaptable to Python 2.4's cookielib.

Python, 46 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import sys
import ClientCookie

class HTTPMyDebugProcessor(ClientCookie.BaseHandler):
    """ Track HTTP requests and responses with this custom handler.
    Be sure to add it last in your build_opener call, or use:
        handler_order = 900 """
    def __init__(self, httpout=sys.stdout):
        self.httpout = httpout

    def http_request(self, request):
        if __debug__:
            host, full_url = request.get_host(), request.get_full_url()
            url_path = full_url[full_url.find(host) + len(host):]
            self.httpout.write("%s\n" % request.get_full_url())
            self.httpout.write('\n')
            self.httpout.write("%s %s\n" % (request.get_method(), url_path))

            for header in request.header_items():
                self.httpout.write("%s: %s\n" % header[:])

            self.httpout.write('\n')

        return request

    def http_response(self, request, response):
        if __debug__:
            code, msg, hdrs = response.code, response.msg, response.info()
            self.httpout.write("HTTP/1.x %s %s\n" % (code, msg))
            self.httpout.write(str(hdrs))

        return response

    https_request = http_request
    https_response = http_response

# Example
cjar = ClientCookie.LWPCookieJar()
opener = ClientCookie.build_opener(
    ClientCookie.HTTPCookieProcessor(cjar),
    ClientCookie.HTTPRefererProcessor(),
    HTTPMyDebugProcessor(),
)
ClientCookie.install_opener(opener)
response = ClientCookie.urlopen("http://www.google.com")
#...

Here is what the above looks like to the default sys.stdout: http://www.google.com

GET / Host: www.google.com User-agent: Python-urllib/2.1

HTTP/1.x 200 OK Cache-Control: private Content-Type: text/html Set-Cookie: PREF=ID=892f9b970973d9ba:TM=1127155707:LM=1127155707:S=3wopqzsFea4_fMVK; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com Server: GWS/2.1 Content-Length: 2429 Date: Mon, 19 Sep 2005 18:48:27 GMT

When I first started using urllib2 I could not find anyway easily to see my request and verify the headers were correct. Thus I soon discoved that I needed a custom processor that would not disturb the request, or reponse, but allow me to examine them. HTTPMyDebugProcessor was especially usefull when debugging cookie issues.

7 comments

Brandon Beck 18 years, 7 months ago  # | flag

Alternative way... I recently had the need to debug what was going on in urllib2 as well. Upon investigating the urllib2 source, and the underlying httplib source I found that subclasses of urllib2.AbstractHTTPHandler incorporate an optional debuglevel parameter to their __init__ method. They then pass the debug parameter into the httplib.HTTPConnection object. This causes the information you're looking for to be displayed at runtime. Here's the solution I came up with:

def build_opener(debug=False):
    # Create a HTTP and HTTPS handler with the appropriate debug
    # level.  We intentionally create a new one because the
    # OpenerDirector class in urllib2 is smart enough to replace
    # its internal versions with ours if we pass them into the
    # urllib2.build_opener method.  This is much easier than trying
    # to introspect into the OpenerDirector to find the existing
    # handlers.
    http_handler = urllib2.HTTPHandler(debuglevel=debug)
    https_handler = urllib2.HTTPSHandler(debuglevel=debug)

    # We want to process cookies, but only in memory so just use
    # a basic memory-only cookie jar instance
    cookie_jar = cookielib.CookieJar()
    cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)

    handlers = [http_handler, https_handler, cookie_handler]
    opener = urllib2.build_opener(handlers)

    # Save the cookie jar with the opener just in case it's needed
    # later on
    opener.cookie_jar = cookie_jar

    return opener
Brandon Beck 18 years, 7 months ago  # | flag

Alternative way... I recently had the need to debug what was going on in urllib2 as well. Upon investigating the urllib2 source, and the underlying httplib source I found that subclasses of urllib2.AbstractHTTPHandler incorporate an optional debuglevel parameter to their __init__ method. They then pass the debug parameter into the httplib.HTTPConnection object. This causes the information you're looking for to be displayed at runtime. Here's the solution I came up with:

def build_opener(debug=False):
    # Create a HTTP and HTTPS handler with the appropriate debug
    # level.  We intentionally create a new one because the
    # OpenerDirector class in urllib2 is smart enough to replace
    # its internal versions with ours if we pass them into the
    # urllib2.build_opener method.  This is much easier than trying
    # to introspect into the OpenerDirector to find the existing
    # handlers.
    http_handler = urllib2.HTTPHandler(debuglevel=debug)
    https_handler = urllib2.HTTPSHandler(debuglevel=debug)

    # We want to process cookies, but only in memory so just use
    # a basic memory-only cookie jar instance
    cookie_jar = cookielib.CookieJar()
    cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)

    handlers = [http_handler, https_handler, cookie_handler]
    opener = urllib2.build_opener(handlers)

    # Save the cookie jar with the opener just in case it's needed
    # later on
    opener.cookie_jar = cookie_jar

    return opener
Brandon Beck 18 years, 7 months ago  # | flag

Alternative way... I recently had the need to debug what was going on in urllib2 as well. Upon investigating the urllib2 source, and the underlying httplib source I found that subclasses of urllib2.AbstractHTTPHandler incorporate an optional debuglevel parameter to their __init__ method. They then pass the debug parameter into the httplib.HTTPConnection object. This causes the information you're looking for to be displayed at runtime. Here's the solution I came up with:

def build_opener(debug=False):
    # Create a HTTP and HTTPS handler with the appropriate debug
    # level.  We intentionally create a new one because the
    # OpenerDirector class in urllib2 is smart enough to replace
    # its internal versions with ours if we pass them into the
    # urllib2.build_opener method.  This is much easier than trying
    # to introspect into the OpenerDirector to find the existing
    # handlers.
    http_handler = urllib2.HTTPHandler(debuglevel=debug)
    https_handler = urllib2.HTTPSHandler(debuglevel=debug)

    # We want to process cookies, but only in memory so just use
    # a basic memory-only cookie jar instance
    cookie_jar = cookielib.CookieJar()
    cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)

    handlers = [http_handler, https_handler, cookie_handler]
    opener = urllib2.build_opener(handlers)

    # Save the cookie jar with the opener just in case it's needed
    # later on
    opener.cookie_jar = cookie_jar

    return opener
Brandon Beck 18 years, 7 months ago  # | flag

Sorry for the repeats... My apologies for the repeat posts, the site was lagging a bit and I thought my post got lost in the ether.

stevekeogh 18 years, 7 months ago  # | flag

Cookie Handling. Im a rank amateur with python and Im trying to access this URL to scrape score info:

http://www.dreamteamfc.com/dtfc05i/servlet/PlayerProfile?playerid=18307&gameid=183

However it uses a re-direct if you try and access it directly. If you try again (using a browser) it accesses a cookie and grants access.

How can I repeat this behaviour using urllib2?

Thanks.

Alberto Planas 17 years, 4 months ago  # | flag

A correction. As I can see in urllib2.py, the build_opener function has this signature:

def build_opener(*handlers):

So I think that we must change this line in your source code:

opener = urllib2.build_opener(handlers)

for this one:

opener = urllib2.build_opener(*handlers)
Jonathan Cano 17 years, 2 months ago  # | flag

bless you! I've never used cookielib and urllib2 before. I wanted a simple "hello world" style program to show me how things worked. It is remarkable that your example is not in the official documentation!

I spent an hour looking over the official docs for urllib2 and cookielib and got nowhere. I downloaded your example, changed the URL to the site I wanted to test against and VOILA, it worked!

Now I am off and running! I'm sure your example has saved countless people hours of wrestling with documentation!

Created by John Pywtorak on Mon, 19 Sep 2005 (PSF)
Python recipes (4591)
John Pywtorak's recipes (2)

Required Modules

Other Information and Tasks