Welcome, guest | Sign In | My Account | Store | Cart

cookielib is a library new to Python 2.4 Prior to Python 2.4 it existed as ClientCookie, but it's not a drop in replacement - some of the function of ClientCookie has been moved into urllib2.

This example shows code for fetching URIs (with cookie handling - including loading and saving) that will work unchanged on : a machine with python 2.4 (and cookielib) a machine with ClientCookie installed a machine with neither (Obviously on the machine with neither the cookies won't be handled or saved).

Where either cookielib or ClientCookie is available the cookies will be saved in a file. If that file exists already the cookies will first be loaded from it. The file format is a useful plain text format and the attributes of each cookie is accessible in the Cookiejar instance (once loaded).

This may be helpful to those just using ClientCookie as the ClientCookie documentation doesn't appear to document the LWPCookieJar class which is needed for saving and loading cookies.

Python, 89 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#!/usr/local/bin/python
# 31-08-04
#v1.0.0 

# cookie_example.py
# An example showing the usage of cookielib (New to Python 2.4) and ClientCookie

# Copyright Michael Foord
# You are free to modify, use and relicense this code.
# No warranty express or implied for the accuracy, fitness to purpose or otherwise for this code....
# Use at your own risk !!!

# If you have any bug reports, questions or suggestions please contact me.
# If you would like to be notified of bugfixes/updates then please contact me and I'll add you to my mailing list.
# E-mail michael AT foord DOT me DOT uk
# Maintained at www.voidspace.org.uk/atlantibots/pythonutils.html

COOKIEFILE = 'cookies.lwp'          # the path and filename that you want to use to save your cookies in
import os.path

cj = None
ClientCookie = None
cookielib = None

try:                                    # Let's see if cookielib is available
    import cookielib            
except ImportError:
    pass
else:
    import urllib2    
    urlopen = urllib2.urlopen
    cj = cookielib.LWPCookieJar()       # This is a subclass of FileCookieJar that has useful load and save methods
    Request = urllib2.Request

if not cookielib:                   # If importing cookielib fails let's try ClientCookie
    try:                                            
        import ClientCookie 
    except ImportError:
        import urllib2
        urlopen = urllib2.urlopen
        Request = urllib2.Request
    else:
        urlopen = ClientCookie.urlopen
        cj = ClientCookie.LWPCookieJar()
        Request = ClientCookie.Request
        
####################################################
# We've now imported the relevant library - whichever library is being used urlopen is bound to the right function for retrieving URLs
# Request is bound to the right function for creating Request objects
# Let's load the cookies, if they exist. 
    
if cj != None:                                  # now we have to install our CookieJar so that it is used as the default CookieProcessor in the default opener handler
    if os.path.isfile(COOKIEFILE):
        cj.load(COOKIEFILE)
    if cookielib:
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)
    else:
        opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
        ClientCookie.install_opener(opener)

# If one of the cookie libraries is available, any call to urlopen will handle cookies using the CookieJar instance we've created
# (Note that if we are using ClientCookie we haven't explicitly imported urllib2)
# as an example :

theurl = 'http://www.diy.co.uk'         # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
txdata = None                                                                           # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}          # fake a user agent, some websites (like google) don't like automated exploration

try:
    req = Request(theurl, txdata, txheaders)            # create a request object
    handle = urlopen(req)                               # and open it to return a handle on the url
except IOError, e:
    print 'We failed to open "%s".' % theurl
    if hasattr(e, 'code'):
        print 'We failed with error code - %s.' % e.code
else:
    print 'Here are the headers of the page :'
    print handle.info()                             # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)

print
if cj == None:
    print "We don't have a cookie library available - sorry."
    print "I can't show you any cookies."
else:
    print 'These are the cookies we have received so far :'
    for index, cookie in enumerate(cj):
        print index, '  :  ', cookie        
    cj.save(COOKIEFILE)                     # save the cookies again

We can always tell which import was successful. If we are using cookielib then cookielib != None If we are using ClientCookie then ClientCookie != None If we are using neither then cj == None

Request is the function to use to make Request objects urlopen to open URLs !!

Both names will be bound to the appropriate function whichever library is being used.

WHY I'm writing a cgi-proxy called approx.py (see www.voidspace.org.uk/atlantibots/pythonutils.html#cgiproxy ). It remotely fetches webpages for those in a restricted internet environment. If ClientCookie is available it will handle cookies (and works well) - including loading/saving a different set of cookies for each user. My server has python 2.2 - but I'd like the script to function well on machines with Python 2.4 or without ClientCookie at all. This code installs a Cookiejar and CookieProcessor as the default handler for urllib2.urlopen if these are available. Otherwise calls to urlopen work as normal.

If the example works as it should then you'll see some page headers printed and then the cookie that the server sent you. This should then be saved to a file 'cookies.lwp' (of course you may need to install ClientCookie)

Of course this example also illustrates using Request objects and headers etc to fetch webpages....

9 comments

Ian Bicking 19 years, 7 months ago  # | flag

backporting cookielib. Is cookielib backward compatible to older versions of Python? Or can it be ported if not? This seems easier than dealing with both ClientCookie and cookielib.

Michael Foord (author) 19 years, 7 months ago  # | flag

Backporting cookielib. The new cookielib uses a modified urllib2 - so it's not as straightforward as just making cookielib available. ClientCookie also has various other 'goodies' that weren't included in cookielib - which is another reason for someone still wanting to use ClientCookie rather than cookielib.

Having said that... it still might be possible,. I already have ClientCookie installed on the server I use, and am more than happy with it. the above chunk of code means my script will run with the same functionality on a machine with Python 2.4 and will work fine on a machine with neither.

Nikos Kouremenos 19 years, 7 months ago  # | flag

This code is magnificent and just works as it should be :). and you shouldn't mind the whole 2 libs being checked. Nicely done.

some small proposals:

i) very often one needs to spool the referer header, so you could have added that header too, except from adding the UserAgent:

txheaders = {'User-agent' : 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7)', 'Referer' : refererUrl}

ii) eventhough you describe it, you don't say exactly how exactly is it done if you want to POST and not GET. something like this in a comments could be better [not sure though u decide]

params = {'DomainNumber':'0', 'PhoneNo':PHONE_NO, 'Password':PASSWD}

txdata = urllib.urlencode(params)

anyways. excellent code [I voted for you 5 out of 5] and I just put the above stuff here, just if anyone was wondering [as I did]

Michael Foord (author) 19 years, 7 months ago  # | flag

Thanks. Thanks for the appreciation !

I also like your additional examples.... - Fuzzy

Mikael Norgren 19 years, 6 months ago  # | flag

Typo? Think there's a lil' typo in the article.

Shouldn't

Request = urlib2.Request

be

Request = urllib2.Request

(urlib2 -> urllib2)?

Nikos Kouremenos 19 years, 4 months ago  # | flag

yes, it's a typo.. and it became obvius to me too while using Python 2.4 :)

Michael Foord (author) 19 years, 3 months ago  # | flag

Oops.. Sorry about that... typos belatedly corrected.

Alen Ribic 17 years, 1 month ago  # | flag

Empty cookies.lwp file when save() called. Hi Michael,

When running the cookie_example.py my cookies.lwp get updated but it only has the following line in it "#LWP-Cookies-2.0". I checked the log and I do see the output for: "for index, cookie in enumerate(cj): print index, ' : ', cookie".

Any ideas why the file would be writing just "#LWP-Cookies-2.0" on first line and not the cookie entries?

Regards, -Alen

Vladimir Cambur 16 years, 9 months ago  # | flag

session cookies. if there are only session cookies you won't see them in the cookies.lwp because by default session cookies are not saved. if you pass ignore_discard=True to save() then they will be saved.