Say you need to make sure the HTTP headers to and from the server are right? Or you just want to track them like using Firefox and LiveHTTPHeaders. Use this custom processor to watch them. Note that I used the ClientCookie package, but this should work with urllib2 without ClientCookie. It should also be adaptable to Python 2.4's cookielib.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import sys
import ClientCookie
class HTTPMyDebugProcessor(ClientCookie.BaseHandler):
""" Track HTTP requests and responses with this custom handler.
Be sure to add it last in your build_opener call, or use:
handler_order = 900 """
def __init__(self, httpout=sys.stdout):
self.httpout = httpout
def http_request(self, request):
if __debug__:
host, full_url = request.get_host(), request.get_full_url()
url_path = full_url[full_url.find(host) + len(host):]
self.httpout.write("%s\n" % request.get_full_url())
self.httpout.write('\n')
self.httpout.write("%s %s\n" % (request.get_method(), url_path))
for header in request.header_items():
self.httpout.write("%s: %s\n" % header[:])
self.httpout.write('\n')
return request
def http_response(self, request, response):
if __debug__:
code, msg, hdrs = response.code, response.msg, response.info()
self.httpout.write("HTTP/1.x %s %s\n" % (code, msg))
self.httpout.write(str(hdrs))
return response
https_request = http_request
https_response = http_response
# Example
cjar = ClientCookie.LWPCookieJar()
opener = ClientCookie.build_opener(
ClientCookie.HTTPCookieProcessor(cjar),
ClientCookie.HTTPRefererProcessor(),
HTTPMyDebugProcessor(),
)
ClientCookie.install_opener(opener)
response = ClientCookie.urlopen("http://www.google.com")
#...
|
Here is what the above looks like to the default sys.stdout: http://www.google.com
GET / Host: www.google.com User-agent: Python-urllib/2.1
HTTP/1.x 200 OK Cache-Control: private Content-Type: text/html Set-Cookie: PREF=ID=892f9b970973d9ba:TM=1127155707:LM=1127155707:S=3wopqzsFea4_fMVK; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com Server: GWS/2.1 Content-Length: 2429 Date: Mon, 19 Sep 2005 18:48:27 GMT
When I first started using urllib2 I could not find anyway easily to see my request and verify the headers were correct. Thus I soon discoved that I needed a custom processor that would not disturb the request, or reponse, but allow me to examine them. HTTPMyDebugProcessor was especially usefull when debugging cookie issues.
Alternative way... I recently had the need to debug what was going on in urllib2 as well. Upon investigating the urllib2 source, and the underlying httplib source I found that subclasses of urllib2.AbstractHTTPHandler incorporate an optional debuglevel parameter to their __init__ method. They then pass the debug parameter into the httplib.HTTPConnection object. This causes the information you're looking for to be displayed at runtime. Here's the solution I came up with:
Alternative way... I recently had the need to debug what was going on in urllib2 as well. Upon investigating the urllib2 source, and the underlying httplib source I found that subclasses of urllib2.AbstractHTTPHandler incorporate an optional debuglevel parameter to their __init__ method. They then pass the debug parameter into the httplib.HTTPConnection object. This causes the information you're looking for to be displayed at runtime. Here's the solution I came up with:
Alternative way... I recently had the need to debug what was going on in urllib2 as well. Upon investigating the urllib2 source, and the underlying httplib source I found that subclasses of urllib2.AbstractHTTPHandler incorporate an optional debuglevel parameter to their __init__ method. They then pass the debug parameter into the httplib.HTTPConnection object. This causes the information you're looking for to be displayed at runtime. Here's the solution I came up with:
Sorry for the repeats... My apologies for the repeat posts, the site was lagging a bit and I thought my post got lost in the ether.
Cookie Handling. Im a rank amateur with python and Im trying to access this URL to scrape score info:
http://www.dreamteamfc.com/dtfc05i/servlet/PlayerProfile?playerid=18307&gameid=183
However it uses a re-direct if you try and access it directly. If you try again (using a browser) it accesses a cookie and grants access.
How can I repeat this behaviour using urllib2?
Thanks.
A correction. As I can see in urllib2.py, the build_opener function has this signature:
So I think that we must change this line in your source code:
for this one:
bless you! I've never used cookielib and urllib2 before. I wanted a simple "hello world" style program to show me how things worked. It is remarkable that your example is not in the official documentation!
I spent an hour looking over the official docs for urllib2 and cookielib and got nowhere. I downloaded your example, changed the URL to the site I wanted to test against and VOILA, it worked!
Now I am off and running! I'm sure your example has saved countless people hours of wrestling with documentation!