This recipe uses the win32file.FindFilesW() function to efficiently calculate total size of a folder or volume, and additionally handles cases where a cutoff size is desired or errors are encountered along the path.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | import win32file as _win32file
import sys as _sys
class FolderSize:
"""
This class implements an efficient technique for
retrieving the size of a given folder or volume in
cases where some action is needed based on a given
size.
The implementation is designed to handle situations
where a specific size is desired to watch for,
in addition to a total size, before a subsequent
action is taken. This dramatically improves
performance where only a small number of bytes
are sufficient to call off a search instead of
waiting for the entire size.
In addition, the design is set to handle problems
encountered at points during the search, such as
permission errors. Such errors are captured so that
a user could further investigate the problem and why
it occurred. These errors do not stop the search from
completing; the total size returned is still provided,
minus the size from folders with errors.
When calling a new search, the errors and total size
from the previous search are reset; however, the stop
size persists unless changed.
"""
def __init__(self):
# This is the total size returned. If a stop size
# is provided, then the total size will be the last
# bytes counted after the stop size was triggered.
self.totalSize = 0
# This mapping holds any errors that have occurred
# during the search. The key is the path name, and
# its value is a string of the error itself.
self.errors = {}
# This is the size where the search will end. The default
# is -1 and it represents no stop size.
self._stopSize = -1
# This prints verbose information on path names.
self.verbose = 0
def enableStopSize(self, size=0):
"""
This public method enables the stop size
criteria. If the number of bytes thus far
calculated exceeds this size, the search is
stopped.
The default value is zero bytes and means anything
greater will end the search.
"""
if type(size) != int:
print "Error: size must be an integer"
_sys.exit(1)
self._stopSize = size
def disableStopSize(self):
"""
This public method disables the stop size
criteria. When disabled, the total size of
a folder is retrieved.
"""
self._stopSize = -1
def showStopSize(self):
"""
This public method displays the current
stop size in bytes.
"""
print self._stopSize
def searchPath(self, path):
"""
This public method initiates the process
of retrieving size data. It accepts either
a UNC or local drive path.
"""
# Reset the values on every new invocation.
self.totalSize = 0
self.errors = {}
self._getSize(path)
def _getSize(self, path):
"""
This private method calculates the total size
of a folder or volume, and accepts a UNC or
local path.
"""
if self.verbose: print path
# Get the list of files and folders.
try:
items = _win32file.FindFilesW(path + "\\*")
except _win32file.error, details:
self.errors[path] = str(details[-1])
return
# Add the size or perform recursion on folders.
for item in items:
attr = item[0]
name = item[-2]
size = item[5]
if attr & 16:
if name != "." and name != "..":
self._getSize("%s\\%s" % (path, name))
self.totalSize += size
if self._stopSize > -1:
if self.totalSize > self._stopSize:
return
if __name__ == "__main__":
# Get the size of entire folder.
sizer = FolderSize()
sizer.searchPath(r"d:\users1\jsmith")
print sizer.totalSize
# Enable stop size (in bytes). Default is zero if no arg provided.
sizer.enableStopSize(1024)
sizer.searchPath(r"d:\users1\jsmith")
if sizer.totalSize > 1024:
print "The folder meets the criteria."
elif sizer.totalSize == 0:
print "The folder is empty."
else:
print "The folder has some data but can be skipped."
# If the total size is zero, make sure no errors have occurred.
# It may mean the initial path failed. Otherwise, errors are always from
# subfolders.
if sizer.totalSize == 0 and sizer.errors:
print sizer.errors
|
At my job, I needed to determine whether a given folder had data in it. The criteria therefore was anything greater than zero bytes would be flagged.
Using os.listdir() or os.walk() (without getting into details) weren't fast enough for gigabytes of data. I then started using the COM component FileSystemObject().GetFolder().Size property to get the total size, which had good speed, but didn't work when errors occurred. The component also wasn't as flexible in that I had to wait for the total size before moving to the next root path.
Having sniffed the traffic to inspect how the FSO worked, I noticed it used the Win32 API's FindFirstFile* functions which are much more efficient to calculate size, and already implemented in win32file.FindFilesW() by Mark Hammond and friends. (Didn't realize this before, or forgot about it if I did.)
I wrapped a class around the win32file functions and added additional features like handling errors so that the search could still continue, and a cutoff size so that total sizes weren't returned when very small criteria were needed (like zero bytes).