Use John J. Lee's ClientCookie and ClientForm classes to easily access password-protected web applications. A group on yahoo.com is used as an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
import sys sys.path.append('ClientCookie-1.0.3') import ClientCookie sys.path.append('ClientForm-0.1.17') import ClientForm # Create special URL opener (for User-Agent) and cookieJar cookieJar = ClientCookie.CookieJar() opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cookieJar)) opener.addheaders = [("User-agent","Mozilla/5.0 (compatible)")] ClientCookie.install_opener(opener) fp = ClientCookie.urlopen("http://login.yahoo.com") forms = ClientForm.ParseResponse(fp) fp.close() # print forms on this page for form in forms: print "***************************" print form form = forms form["login"] = "yahoo-user-id" # use your userid form["passwd"] = "password" # use your password fp = ClientCookie.urlopen(form.click()) fp.close() fp = ClientCookie.urlopen("http://groups.yahoo.com/group/mygroup") # use your group fp.readlines() fp.close()
Many web applications require the user to fill out a login form. This recipe shows a very easy way to do it in Python so that you can get data from the site for scraping purposes.
I simply establish a persistent connection to a site (groups.yahoo.com) that requires you to fill out a form. The recipe should be easily adaptable to other sites such as eBay or PayPal. The task is easy using John J. Lee's CleintCookie and ClientForm classes.
After untarring the tar.gz files, I used the above python to access my yahoo account (Look, Ma! No browser!)
I am using Python version 2.3.4 on Fedora Core 3.
Note that this kind of form-based authentication is nothing like http basic authentication. Therefore, you can't simply put the username and password in the url as in:
http://username:email@example.com # This does not work.
Refer to Mike Foord's recipe at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305288 to find out how to access sites that use http basic authentication.
All kudos to John J. Lee