Quickly find out whether a web file exists.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | """
httpExists.py
A quick and dirty way to to check whether a web file is there.
Usage:
>>> from httpExists import *
>>> httpExists('http://www.python.org/')
1
>>> httpExists('http://www.python.org/PenguinOnTheTelly')
Status 404 Not Found : http://www.python.org/PenguinOnTheTelly
0
"""
import httplib
import urlparse
def httpExists(url):
host, path = urlparse.urlsplit(url)[1:3]
found = 0
try:
connection = httplib.HTTPConnection(host) ## Make HTTPConnection Object
connection.request("HEAD", path)
responseOb = connection.getresponse() ## Grab HTTPResponse Object
if responseOb.status == 200:
found = 1
else:
print "Status %d %s : %s" % (responseOb.status, responseOb.reason, url)
except Exception, e:
print e.__class__, e, url
return found
def _test():
import doctest, httpExists
return doctest.testmod(httpExists)
if __name__ == "__main__":
_test()
|
I needed to check whether some URLs were valid and I didn't need all the functionality of webchecker.py so I wrote this little recipe.
The URL must start with 'http://' due to the way urlparse.urlsplit() interprets URLs.
Tags: network
Catch "302: Moved temporarily" I added this to allow for 302 responses, which are processed automatically by most (all?) browsers.
Hmmm... Part of me thinks that either httplib should have stock code for handling 300 status code redirections, or urllib should handle HEAD requests. The fact that this isn't provided by the standard libraries is crappy in my opinion.
Here's what I use to handle redirection. Recursion's a bad idea, it should be an iterative loop with a limit to avoid infinite redirection.