Welcome, guest | Sign In | My Account | Store | Cart

BigDirs locates directories taking up the most disk space in files at that directory level, not in the nested directories below it. In my experience this is the easiest way to find where my disk space got chewed up.

Set DEFAULT_DIR for the root directory you wish to check. Set DEFAULT_THRESHOLD for the minimum size of the directories you wish to see in the output.

Python, 74 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# -*- coding: cp1251 -*-
import os, os.path
import sys
import stat
import locale

__author__=["Jack Trainor (jacktrainor@gmail.com)",]
__version__="2010-07-17"

ONE_MEG = 2 ** 20
ONE_GIG = 2 ** 30
DEFAULT_THRESHOLD = 100 * ONE_MEG
DEFAULT_DIR = "c:\\"

class Walker(object):
    def __init__(self, dir):
        self.dir = dir
            
    def is_valid_file(self, file_name):
        return True
        
    def is_valid_dir(self, dir):
        return True
                  
    def execute_file(self, path):
        pass
            
    def execute_dir(self, path):
        pass

    def execute(self):
        for root, dirs, file_names in os.walk(self.dir):
            for file_name in file_names:
                if self.is_valid_dir(root) and self.is_valid_file(file_name):
                    path = os.path.join(root, file_name)
                    self.execute_file(path)
            for dir in dirs:
                if self.is_valid_dir(root):
                    path = os.path.join(root, dir)
                    self.execute_dir(path)
        return self 

class BigDirs(Walker):
    def __init__(self, dir, threshold=DEFAULT_THRESHOLD):
        Walker.__init__(self, dir)
        self.threshold = threshold
        self.dirs = {}

    def execute_file(self, path):
        try:
            file_size = os.path.getsize(path)
            dir, name = os.path.split(path)
            cur_size = self.dirs.get(dir, 0)
            self.dirs[dir] = cur_size + file_size
        except Exception, e:
            sys.stderr.write("%s %s %s\n" % ("BigDirs.execute_file", path, e))
        
    def execute(self):
        try:
            locale.setlocale(locale.LC_ALL, "")
            Walker.execute(self)
            keys = self.dirs.keys()
            decorated_list = [ (self.dirs[key], key) for key in keys ]
            decorated_list.sort()
            for item in decorated_list:
                if item[0] > self.threshold:
                    print "%10s MB %s" % (locale.format('%d', item[0]/ONE_MEG, True), item[1])
        except Exception, e:
            sys.stderr.write("%s %s\n" % ("BigDirs.execute", e))
        return self 

if __name__ == "__main__": 
    walker = BigDirs(DEFAULT_DIR, DEFAULT_THRESHOLD).execute()
    raw_input("BigDirs complete. Press RETURN...")

I wrote this because ZoneAlarm was silently consuming gigabytes in its tvDebug.log file and I'd forget where it was located and how to turn it off. (Google tvDebug.log if you're curious.)

BigDirs checks every file starting at the root directory so be patient. It prints out all the directories using disk space over the threshold, sorted from smallest to largest.

Walker is a general purpose class I reuse for traversing a directory. It has more functionality than BigDirs needs.

7 comments

David Klaffenbach 13 years, 8 months ago  # | flag

This doesn't seem to like *.lnk files under windows.

Jack Trainor (author) 13 years, 8 months ago  # | flag

What are you seeing?

I'm running Python 2.5 under Windows XP SP3. BigDirs executes *.lnk files as small files < 1K. I've added some exception handling to BigDirs.execute_file so that if there is a problem it won't abort the run.

Thanks for trying the recipe!

Jerry Rocteur 13 years, 8 months ago  # | flag

I changed it to ignore links!

def execute_file(self, path): # Don't check links if not os.path.islink(path): try: .... ....

s g 13 years, 7 months ago  # | flag

Pydirstat is an alternative, more full-featured, cross-platform python solution.

Jack Trainor (author) 13 years, 7 months ago  # | flag

Pydirstat is a multi-megabyte application without source code.

Vincent Lowe 13 years, 5 months ago  # | flag

Thanks, this was helpful.

paul clinch 13 years, 1 month ago  # | flag

Pydirstat appears to offer source code in http://prdownload.berlios.de/pydirstat/pydirstat-0.9.15.tar.gz, and states it is GPL on the project home page.