BigDirs locates directories taking up the most disk space in files at that directory level, not in the nested directories below it. In my experience this is the easiest way to find where my disk space got chewed up.
Set DEFAULT_DIR for the root directory you wish to check. Set DEFAULT_THRESHOLD for the minimum size of the directories you wish to see in the output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | # -*- coding: cp1251 -*-
import os, os.path
import sys
import stat
import locale
__author__=["Jack Trainor (jacktrainor@gmail.com)",]
__version__="2010-07-17"
ONE_MEG = 2 ** 20
ONE_GIG = 2 ** 30
DEFAULT_THRESHOLD = 100 * ONE_MEG
DEFAULT_DIR = "c:\\"
class Walker(object):
def __init__(self, dir):
self.dir = dir
def is_valid_file(self, file_name):
return True
def is_valid_dir(self, dir):
return True
def execute_file(self, path):
pass
def execute_dir(self, path):
pass
def execute(self):
for root, dirs, file_names in os.walk(self.dir):
for file_name in file_names:
if self.is_valid_dir(root) and self.is_valid_file(file_name):
path = os.path.join(root, file_name)
self.execute_file(path)
for dir in dirs:
if self.is_valid_dir(root):
path = os.path.join(root, dir)
self.execute_dir(path)
return self
class BigDirs(Walker):
def __init__(self, dir, threshold=DEFAULT_THRESHOLD):
Walker.__init__(self, dir)
self.threshold = threshold
self.dirs = {}
def execute_file(self, path):
try:
file_size = os.path.getsize(path)
dir, name = os.path.split(path)
cur_size = self.dirs.get(dir, 0)
self.dirs[dir] = cur_size + file_size
except Exception, e:
sys.stderr.write("%s %s %s\n" % ("BigDirs.execute_file", path, e))
def execute(self):
try:
locale.setlocale(locale.LC_ALL, "")
Walker.execute(self)
keys = self.dirs.keys()
decorated_list = [ (self.dirs[key], key) for key in keys ]
decorated_list.sort()
for item in decorated_list:
if item[0] > self.threshold:
print "%10s MB %s" % (locale.format('%d', item[0]/ONE_MEG, True), item[1])
except Exception, e:
sys.stderr.write("%s %s\n" % ("BigDirs.execute", e))
return self
if __name__ == "__main__":
walker = BigDirs(DEFAULT_DIR, DEFAULT_THRESHOLD).execute()
raw_input("BigDirs complete. Press RETURN...")
|
I wrote this because ZoneAlarm was silently consuming gigabytes in its tvDebug.log file and I'd forget where it was located and how to turn it off. (Google tvDebug.log if you're curious.)
BigDirs checks every file starting at the root directory so be patient. It prints out all the directories using disk space over the threshold, sorted from smallest to largest.
Walker is a general purpose class I reuse for traversing a directory. It has more functionality than BigDirs needs.
This doesn't seem to like *.lnk files under windows.
What are you seeing?
I'm running Python 2.5 under Windows XP SP3. BigDirs executes *.lnk files as small files < 1K. I've added some exception handling to BigDirs.execute_file so that if there is a problem it won't abort the run.
Thanks for trying the recipe!
I changed it to ignore links!
def execute_file(self, path): # Don't check links if not os.path.islink(path): try: .... ....
Pydirstat is an alternative, more full-featured, cross-platform python solution.
Pydirstat is a multi-megabyte application without source code.
Thanks, this was helpful.
Pydirstat appears to offer source code in http://prdownload.berlios.de/pydirstat/pydirstat-0.9.15.tar.gz, and states it is GPL on the project home page.