Welcome, guest | Sign In | My Account | Store | Cart

A generator which provides a quick way to 'walk' a zip file archive. The generator can process multiple zip archives, i.e zip files which contain zip files. Inspired by the os.walk function.

Python, 68 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import zipfile
import os
import cStringIO

def zipwalk(zfilename):
    """Zip file tree generator.

    For each file entry in a zip archive, this yields
    a two tuple of the zip information and the data
    of the file as a StringIO object.

    zipinfo, filedata

    zipinfo is an instance of zipfile.ZipInfo class
    which gives information of the file contained
    in the zip archive. filedata is a StringIO instance
    representing the actual file data.

    If the file again a zip file, the generator extracts
    the contents of the zip file and walks them.

    Inspired by os.walk .
    """

    tempdir=os.environ.get('TEMP',os.environ.get('TMP',os.environ.get('TMPDIR','/tmp')))
    
    try:
        z=zipfile.ZipFile(zfilename,'r')
        for info in z.infolist():
            fname = info.filename
            data = z.read(fname)
            extn = (os.path.splitext(fname)[1]).lower()

            if extn=='.zip':
                checkz=False
                
                tmpfpath = os.path.join(tempdir,os.path.basename(fname))
                try:
                    open(tmpfpath,'w+b').write(data)
                except (IOError, OSError),e:
                    print e

                if zipfile.is_zipfile(tmpfpath):
                    checkz=True

                if checkz:
                    try:
                        for x in zipwalk(tmpfpath):
                            yield x
                    except Exception, e:
                        raise
                    
                try:
                    os.remove(tmpfpath)
                except:
                    pass
            else:
                yield (info, cStringIO.StringIO(data))
    except RuntimeError, e:
        print 'Runtime Error'
    except zipfile.error, e:
        raise
                        
if __name__=="__main__":
    import sys
    
    for i,d in zipwalk(sys.argv[1]):
        print i.filename
    
    

I have to write Python scripts at work to process zip files which sometimes contain child zip files. I find this recipe very useful for working with such multiple zip file archives. A generator is quite handy, since a recursive function for the same can often exhaust the stack memory.

1 comment

Roger 9 years ago  # | flag

Finally found this. Thanks !