Welcome, guest | Sign In | My Account | Store | Cart

This script shows how to resume downloading of a file that has been partially downloaded from a web server. It's been tested with Apache 1.3.x, but should work with any web server that understands the "range" header.

Python, 43 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import urllib, os

class myURLOpener(urllib.FancyURLopener):
    """Create sub-class in order to overide error 206.  This error means a
       partial file is being sent,
       which is ok in this case.  Do nothing with this error.
    """
    def http_error_206(self, url, fp, errcode, errmsg, headers, data=None):
        pass

loop = 1
dlFile = "2.6Distrib.zip"
existSize = 0
myUrlclass = myURLOpener()
if os.path.exists(dlFile):
    outputFile = open(dlFile,"ab")
    existSize = os.path.getsize(dlFile)
    #If the file exists, then only download the remainder
    myUrlclass.addheader("Range","bytes=%s-" % (existSize))
else:
    outputFile = open(dlFile,"wb")

webPage = myUrlclass.open("http://localhost/%s" % dlFile)

#If the file exists, but we already have the whole thing, don't download again
if int(webPage.headers['Content-Length']) == existSize:
    loop = 0
    print "File already downloaded"

numBytes = 0
while loop:
    data = webPage.read(8192)
    if not data:
        break
    outputFile.write(data)
    numBytes = numBytes + len(data)

webPage.close()
outputFile.close()

for k,v in webPage.headers.items():
    print k, "=",v
print "copied", numBytes, "bytes from", webPage.url

This script uses the extra header - "Range" to let the web server know that we only want a certian range of data to be downloaded. The server must support this, but this is part of the HTTP1.1 spec, so it should be widely supported.

I essentially use the urllib.FancyURLopener to do all the dirty work of adding a new header and doing the normal handshaking. I just had to let it know that the "error" 206 is not really an error - just continue to proceed normally.

I also do some extra checks to quit the download if I've already downloaded the whole file.

Check out the HTTP1.1 RFC to learn more about what the headers mean. The script should probably do a check to make sure the web server accepts "range" but that is pretty simple to do.

5 comments

saddle saddle 18 years, 9 months ago  # | flag

thanks, but some times doen's work. i downlaod a url "http://news.sina.com.cn/old1000/news1000_20050702.shtml" this page about 1.1M, on Windows 2000, python2.4 sometimes, it resumes, and sometime, it downlaod from begin, don't know why.

skyuuka 12 years, 7 months ago  # | flag

Thanks! it works for me.

Jurek Kedra 11 years, 2 months ago  # | flag

Great example! Thanks!

Mohammed Faheem 7 years, 1 month ago  # | flag

how to implement download status, pausing,and resuming in this code

Mohammed Faheem 7 years, 1 month ago  # | flag

How to get the filename of the downloading file automatically.