This script shows how to resume downloading of a file that has been partially downloaded from a web server. It's been tested with Apache 1.3.x, but should work with any web server that understands the "range" header.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import urllib, os
class myURLOpener(urllib.FancyURLopener):
"""Create sub-class in order to overide error 206. This error means a
partial file is being sent,
which is ok in this case. Do nothing with this error.
"""
def http_error_206(self, url, fp, errcode, errmsg, headers, data=None):
pass
loop = 1
dlFile = "2.6Distrib.zip"
existSize = 0
myUrlclass = myURLOpener()
if os.path.exists(dlFile):
outputFile = open(dlFile,"ab")
existSize = os.path.getsize(dlFile)
#If the file exists, then only download the remainder
myUrlclass.addheader("Range","bytes=%s-" % (existSize))
else:
outputFile = open(dlFile,"wb")
webPage = myUrlclass.open("http://localhost/%s" % dlFile)
#If the file exists, but we already have the whole thing, don't download again
if int(webPage.headers['Content-Length']) == existSize:
loop = 0
print "File already downloaded"
numBytes = 0
while loop:
data = webPage.read(8192)
if not data:
break
outputFile.write(data)
numBytes = numBytes + len(data)
webPage.close()
outputFile.close()
for k,v in webPage.headers.items():
print k, "=",v
print "copied", numBytes, "bytes from", webPage.url
|
This script uses the extra header - "Range" to let the web server know that we only want a certian range of data to be downloaded. The server must support this, but this is part of the HTTP1.1 spec, so it should be widely supported.
I essentially use the urllib.FancyURLopener to do all the dirty work of adding a new header and doing the normal handshaking. I just had to let it know that the "error" 206 is not really an error - just continue to proceed normally.
I also do some extra checks to quit the download if I've already downloaded the whole file.
Check out the HTTP1.1 RFC to learn more about what the headers mean. The script should probably do a check to make sure the web server accepts "range" but that is pretty simple to do.
thanks, but some times doen's work. i downlaod a url "http://news.sina.com.cn/old1000/news1000_20050702.shtml" this page about 1.1M, on Windows 2000, python2.4 sometimes, it resumes, and sometime, it downlaod from begin, don't know why.
Thanks! it works for me.
Great example! Thanks!
how to implement download status, pausing,and resuming in this code
How to get the filename of the downloading file automatically.