Welcome, guest | Sign In | My Account | Store | Cart

Resuming download of a file (Python recipe) by Chris Moffitt
ActiveState Code (http://code.activestate.com/recipes/83208/)

This script shows how to resume downloading of a file that has been partially downloaded from a web server. It's been tested with Apache 1.3.x, but should work with any web server that understands the "range" header.

      import urllib, os

class myURLOpener(urllib.FancyURLopener):
    """Create sub-class in order to overide error 206.  This error means a
       partial file is being sent,
       which is ok in this case.  Do nothing with this error.
    """
    def http_error_206(self, url, fp, errcode, errmsg, headers, data=None):
        pass

loop = 1
dlFile = "2.6Distrib.zip"
existSize = 0
myUrlclass = myURLOpener()
if os.path.exists(dlFile):
    outputFile = open(dlFile,"ab")
    existSize = os.path.getsize(dlFile)
    #If the file exists, then only download the remainder
    myUrlclass.addheader("Range","bytes=%s-" % (existSize))
else:
    outputFile = open(dlFile,"wb")

webPage = myUrlclass.open("http://localhost/%s" % dlFile)

#If the file exists, but we already have the whole thing, don't download again
if int(webPage.headers['Content-Length']) == existSize:
    loop = 0
    print "File already downloaded"

numBytes = 0
while loop:
    data = webPage.read(8192)
    if not data:
        break
    outputFile.write(data)
    numBytes = numBytes + len(data)

webPage.close()
outputFile.close()

for k,v in webPage.headers.items():
    print k, "=",v
print "copied", numBytes, "bytes from", webPage.url

      

This script uses the extra header - "Range" to let the web server know that we only want a certian range of data to be downloaded. The server must support this, but this is part of the HTTP1.1 spec, so it should be widely supported.

I essentially use the urllib.FancyURLopener to do all the dirty work of adding a new header and doing the normal handshaking. I just had to let it know that the "error" 206 is not really an error - just continue to proceed normally.

I also do some extra checks to quit the download if I've already downloaded the whole file.

Check out the HTTP1.1 RFC to learn more about what the headers mean. The script should probably do a check to make sure the web server accepts "range" but that is pretty simple to do.

Tags: web

5 comments

saddle saddle 18 years, 10 months ago # | flag

thanks, but some times doen's work. i downlaod a url "http://news.sina.com.cn/old1000/news1000_20050702.shtml" this page about 1.1M, on Windows 2000, python2.4 sometimes, it resumes, and sometime, it downlaod from begin, don't know why.

skyuuka 12 years, 8 months ago # | flag

Thanks! it works for me.

Jurek Kedra 11 years, 3 months ago # | flag

Great example! Thanks!

Mohammed Faheem 7 years, 2 months ago # | flag

how to implement download status, pausing,and resuming in this code

Mohammed Faheem 7 years, 2 months ago # | flag

How to get the filename of the downloading file automatically.

Created by Chris Moffitt on Wed, 24 Oct 2001 (PSF)

◄	Python recipes (4591)	►
◄	Chris Moffitt's recipes (2)	►
◄	Python Cookbook Edition 2 (117)	►
◄	Python Cookbook Edition 1 (103)	►

Required Modules

urllib
os

Other Information and Tasks

Licensed under the PSF License
Viewed 31565 times
Revision 2 (updated 22 years ago)

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Resuming download of a file (Python recipe) by Chris Moffitt ActiveState Code (http://code.activestate.com/recipes/83208/)