Welcome, guest | Sign In | My Account | Store | Cart

This recipe uses the win32file.FindFilesW() function to efficiently calculate total size of a folder or volume, and additionally handles cases where a cutoff size is desired or errors are encountered along the path.

Python, 154 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
import win32file as _win32file
import sys as _sys


class FolderSize:
    """
    This class implements an efficient technique for
    retrieving the size of a given folder or volume in
    cases where some action is needed based on a given
    size.

    The implementation is designed to handle situations
    where a specific size is desired to watch for,
    in addition to a total size, before a subsequent
    action is taken. This dramatically improves
    performance where only a small number of bytes
    are sufficient to call off a search instead of
    waiting for the entire size.
    
    In addition, the design is set to handle problems
    encountered at points during the search, such as
    permission errors. Such errors are captured so that
    a user could further investigate the problem and why
    it occurred. These errors do not stop the search from
    completing; the total size returned is still provided,
    minus the size from folders with errors.

    When calling a new search, the errors and total size
    from the previous search are reset; however, the stop
    size persists unless changed.
    """

    def __init__(self):

        # This is the total size returned. If a stop size
        # is provided, then the total size will be the last
        # bytes counted after the stop size was triggered.
        self.totalSize = 0

        # This mapping holds any errors that have occurred
        # during the search. The key is the path name, and
        # its value is a string of the error itself.
        self.errors = {}

        # This is the size where the search will end. The default
        # is -1 and it represents no stop size.
        self._stopSize = -1

        # This prints verbose information on path names.
        self.verbose = 0

    def enableStopSize(self, size=0):
        """
        This public method enables the stop size
        criteria. If the number of bytes thus far
        calculated exceeds this size, the search is
        stopped.

        The default value is zero bytes and means anything
        greater will end the search.
        """

        if type(size) != int:
            print "Error: size must be an integer"
            _sys.exit(1)
        
        self._stopSize = size

    def disableStopSize(self):
        """
        This public method disables the stop size
        criteria. When disabled, the total size of
        a folder is retrieved.
        """

        self._stopSize = -1

    def showStopSize(self):
        """
        This public method displays the current
        stop size in bytes.
        """

        print self._stopSize

    def searchPath(self, path):
        """
        This public method initiates the process
        of retrieving size data. It accepts either
        a UNC or local drive path.
        """

        # Reset the values on every new invocation.
        self.totalSize = 0
        self.errors = {}

        self._getSize(path)

    def _getSize(self, path):
        """
        This private method calculates the total size
        of a folder or volume, and accepts a UNC or
        local path.
        """

        if self.verbose: print path

        # Get the list of files and folders.
        try:
            items = _win32file.FindFilesW(path + "\\*")
        except _win32file.error, details:
            self.errors[path] = str(details[-1])
            return

        # Add the size or perform recursion on folders.
        for item in items:

            attr = item[0]
            name = item[-2]
            size = item[5]
            
            if attr & 16:
                if name != "." and name != "..":
                    self._getSize("%s\\%s" % (path, name))

            self.totalSize += size

            if self._stopSize > -1:
                if self.totalSize > self._stopSize:
                    return


if __name__ == "__main__":

    # Get the size of entire folder.
    sizer = FolderSize()
    sizer.searchPath(r"d:\users1\jsmith")
    print sizer.totalSize

    # Enable stop size (in bytes). Default is zero if no arg provided.
    sizer.enableStopSize(1024)
    sizer.searchPath(r"d:\users1\jsmith")
    if sizer.totalSize > 1024:
        print "The folder meets the criteria."
    elif sizer.totalSize == 0:
        print "The folder is empty."
    else:
        print "The folder has some data but can be skipped."

    # If the total size is zero, make sure no errors have occurred.
    # It may mean the initial path failed. Otherwise, errors are always from
    # subfolders.
    if sizer.totalSize == 0 and sizer.errors:
        print sizer.errors

At my job, I needed to determine whether a given folder had data in it. The criteria therefore was anything greater than zero bytes would be flagged.

Using os.listdir() or os.walk() (without getting into details) weren't fast enough for gigabytes of data. I then started using the COM component FileSystemObject().GetFolder().Size property to get the total size, which had good speed, but didn't work when errors occurred. The component also wasn't as flexible in that I had to wait for the total size before moving to the next root path.

Having sniffed the traffic to inspect how the FSO worked, I noticed it used the Win32 API's FindFirstFile* functions which are much more efficient to calculate size, and already implemented in win32file.FindFilesW() by Mark Hammond and friends. (Didn't realize this before, or forgot about it if I did.)

I wrapped a class around the win32file functions and added additional features like handling errors so that the search could still continue, and a cutoff size so that total sizes weren't returned when very small criteria were needed (like zero bytes).

Created by Higinio Cachola on Thu, 1 May 2008 (PSF)
Python recipes (4591)
Higinio Cachola's recipes (1)

Required Modules

Other Information and Tasks