Welcome, guest | Sign In | My Account | Store | Cart

This script recursively scans a given path and applies a cleaning 'action' to matching files and folders. By default files and folders matching the specified (.endswith) patterns are deleted. Alternatively, _quoted_ glob patterns can used with the '-g' or '--glob' option.

By design, the script lists targets and asks permission before applying cleaning actions. It should be easy to extend this script with further actions and also more intelligent pattern matching functions.

The getch (single key confirmation) functionality comes courtesy of http://code.activestate.com/recipes/134892/

To use it, place the script in your path and call it something like 'clean':

Usage: clean [options] patterns

        deletes files/folder patterns:
            clean .svn .pyc
            clean -p /tmp/folder .svn .csv .bzr .pyc
            clean -g "*.pyc"
            clean -ng "*.py"

        converts line endings from windows to unix:
            clean -e .py
            clean -e -p /tmp/folder .py

Options:
  -h, --help            show this help message and exit
  -p PATH, --path=PATH  set path
  -n, --negated         clean everything except specified patterns
  -e, --endings         clean line endings
  -g, --glob            clean with glob patterns
  -v, --verbose
Python, 268 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
#!/usr/bin/env python
"""
This script recursively scans a given path and applies a cleaning 'action' 
to matching files and folders. By default files and folders matching the 
specified (.endswith) patterns are deleted. Alternatively, _quoted_ glob
patterns can used with the '-g' option.

By design, the script lists targets and asks permission before applying 
cleaning actions. It should be easy to extend this script with further 
cleaning actions and more intelligent pattern matching techniques.

The getch (single key confirmation) functionality comes courtesy of 
http://code.activestate.com/recipes/134892/

To use it, place the script in your path and call it something like 'clean':

    Usage: clean [options] patterns
        
            deletes files/folder patterns:
                clean .svn .pyc
                clean -p /tmp/folder .svn .csv .bzr .pyc
                clean -g "*.pyc"
                clean -ng "*.py"
    
            converts line endings from windows to unix:
                clean -e .py
                clean -e -p /tmp/folder .py

    Options:
      -h, --help            show this help message and exit
      -p PATH, --path=PATH  set path
      -n, --negated         clean everything except specified patterns
      -e, --endings         clean line endings
      -v, --verbose         

"""
from __future__ import print_function
import os, sys, shutil
from fnmatch import fnmatch
from optparse import OptionParser
from os.path import join, isdir, isfile


# to enable single-character confirmation of choices
try:
    import sys, tty, termios
    def getch(txt):
        print(txt, end=' ')
        fd = sys.stdin.fileno()
        old_settings = termios.tcgetattr(fd)
        try:
            tty.setraw(sys.stdin.fileno())
            ch = sys.stdin.read(1)
        finally:
            termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
        return ch
except ImportError:
    import msvcrt
    def getch(txt):
        print(txt, end=' ')
        return msvcrt.getch()

# -----------------------------------------------------
# main class

class Cleaner(object):
    """recursively cleans patterns of files/directories
    """
    def __init__(self, path, patterns):
        self.path = path
        self.patterns = patterns
        self.matchers = {
            # a matcher is a boolean function which takes a string and tries 
            # to match it against any one of the specified patterns, 
            # returning False otherwise
            'endswith': lambda s: any(s.endswith(p) for p in patterns), 
            'glob': lambda s: any(fnmatch(s, p) for p in patterns),
        }
        self.actions = {
            # action: (path_operating_func, matcher)
            'endswith_delete': (self.delete, 'endswith'),
            'glob_delete': (self.delete, 'glob'),
            'convert': (self.clean_endings, 'endswith'),
        }
        self.targets = []
        self.cum_size = 0.0

    def __repr__(self):
        return "<Cleaner: path:%s , patterns:%s>" % (
            self.path, self.patterns)

    def _apply(self, func, confirm=False):
        """applies a function to each target path
        """
        i = 0
        desc = func.__doc__.strip()
        for target in self.targets:
            if confirm:
                question = "\n%s '%s' (y/n/q)? " % (desc, target)
                answer = getch(question)
                if answer in ['y', 'Y']:
                    func(target)
                    i += 1
                elif answer in ['q']: #i.e. quit
                    break
                else:
                    continue
            else:
                func(target)
                i += 1
        if i:
            self.log("Applied '%s' to %s items (%sK)" % (
                desc, i, int(round(self.cum_size/1024.0, 0))))
        else:
            self.log('No action taken')

    @staticmethod
    def _onerror(func, path, exc_info): 
        """ Error handler for shutil.rmtree.

            If the error is due to an access error (read only file)
            it attempts to add write permission and then retries.

            If the error is for another reason it re-raises the error.

            Usage : ``shutil.rmtree(path, onerror=onerror)``
            
            original code by Michael Foord
            bug fix suggested by Kun Zhang

        """
        import stat
        if not os.access(path, os.W_OK):
            # Is the error an access error ?
            os.chmod(path, stat.S_IWUSR)
            func(path)
        else:
            raise

    def log(self, txt):
        print('\n' + txt)

    def do(self, action, negate=False):
        """finds pattern and approves action on results
        """
        func, matcher = self.actions[action]
        if not negate:
            show = lambda p: p if self.matchers[matcher](p) else None
        else:
            show = lambda p: p if not self.matchers[matcher](p) else None
        
        results = self.walk(self.path, show)
        if results:
            question = "%s item(s) found. Apply '%s' to all (y/n/c)? " % (
                len(results), func.__doc__.strip())
            answer = getch(question)
            self.targets = results
            if answer in ['y','Y']:
                self._apply(func)
            elif answer in ['c', 'C']:
                self._apply(func, confirm=True)
            else:
                self.log("Action cancelled.")
        else:
            self.log("No results.")

    def walk(self, path, func, log=True):
        """walk path recursively collecting results of function application
        """
        results = []
        def visit(root, target, prefix):
            for i in target:
                item = join(root, i)
                obj = func(item)
                if obj:
                    results.append(obj)
                    self.cum_size += os.path.getsize(obj)
                    if log: 
                        print(prefix, obj)
        for root, dirs, files in os.walk(path):
            visit(root, dirs, ' +-->')
            visit(root, files,' |-->')
        return results

    def delete(self, path):
        """delete path
        """
        if isfile(path):
            os.remove(path)
        if isdir(path):
            shutil.rmtree(path, onerror=self._onerror)

    def clean_endings(self, path):
        """convert windows endings to unix endings
        """
        with file(path) as old:
            lines = old.readlines()
        string = "".join(l.rstrip()+'\n' for l in lines)
        with file(path, 'w') as new: 
            new.write(string)

    @classmethod
    def cmdline(cls):
        usage = """usage: %prog [options] patterns
        
        deletes files/folder patterns:
            %prog .svn .pyc
            %prog -p /tmp/folder .svn .csv .bzr .pyc
            %prog -g "*.pyc"
            %prog -gn "*.py"

        converts line endings from windows to unix:
            %prog -e .py
            %prog -e -p /tmp/folder .py"""

        parser = OptionParser(usage)
        parser.add_option("-p", "--path", 
                          dest="path", help="set path")
        
        parser.add_option("-n", "--negated",
                         action="store_true", dest="negated", 
                         help="clean everything except specified patterns")
    
        parser.add_option("-e", "--endings", 
                          action="store_true", dest="endings",
                          help="clean line endings")
        
        parser.add_option("-g", "--glob", 
                          action="store_true", dest="glob",
                          help="clean with glob patterns")
    
        parser.add_option("-v", "--verbose",
                          action="store_true", dest="verbose")

        (options, patterns) = parser.parse_args()

        if len(patterns) == 0:
            parser.error("incorrect number of arguments")

        if not options.path:
            options.path = '.'
    
        if options.verbose:
            print('options:', options)
            print('finding patterns: %s in %s' % (patterns, options.path))
        
        cleaner = cls(options.path, patterns)
    
        # convert line endings from windows to unix
        if options.endings and options.negated:
            cleaner.do('convert', negate=True)
        elif options.endings:
            cleaner.do('convert', negate=True)
        
        # glob delete
        elif options.negated and options.glob:
            cleaner.do('glob_delete', negate=True)
        elif options.glob:
            cleaner.do('glob_delete')

        # endswith delete (default)
        elif options.negated:
            cleaner.do('endswith_delete', negate=True)
        else:
            cleaner.do('endswith_delete')

if __name__ == '__main__':
    Cleaner.cmdline()

11 comments

stewart midwinter 15 years, 3 months ago  # | flag

works as advertised!

One improvement I could see is to add another confirmation option besides Y or N. clean may find a long list of files to delete, and you want to delete all but one or two. How about adding a C (for Confirm) option so that you are prompted to confirm the deletion of each file in the result?

stewart midwinter 15 years, 3 months ago  # | flag

For bonus marks, how about a negation -n option? clean -n .py would delete everything BUT the .py files

thanks S

kibleur christophe 15 years, 2 months ago  # | flag

Very nice, it was very useful for cleaning all my LaTeX boilerplate files: .aux .log .dvi and .out files.

Alia Khouri (author) 15 years, 2 months ago  # | flag

Glad you found it useful. Also thanks for the suggestions. I've added the 'c' confirm option and the -n option as well... also code structure is a little cleaner (-:

As a side-note, I also created one version with fnmatch (glob-like) matching and also regex matching, but subsequently dropped it because that just added needless complexity and I also found quoting regexes off the command line to be somewhat counter-intuitive.

AK

Alia Khouri (author) 15 years, 2 months ago  # | flag

A newer version with getch (single key confirmation) functionality coming courtesy of http://code.activestate.com/recipes/134892/

AK

Denis Barmenkov 14 years, 7 months ago  # | flag

I've heard about using '_svn' instead of '.svn' directories with some IDE (MS Visual studio?).

Alia Khouri (author) 14 years, 7 months ago  # | flag

I'm not aware of this, but _svn should work as well. Just do 'clean _svn'

Alia Khouri (author) 14 years, 6 months ago  # | flag

New version with glob pattern matching just uploaded.

Kun Zhang 14 years, 6 months ago  # | flag

Very nice, but it will throw exception on Windows, please refering http://trac.pythonpaste.org/pythonpaste/ticket/359, add a onerror handle function and change line #164 to resolve the problem.

164: shutil.rmtree(path, onerror=onerror)

def onerror(func, path, exc_info): """ Error handler for shutil.rmtree.

If the error is due to an access error (read only file)
it attempts to add write permission and then retries.

If the error is for another reason it re-raises the error.

Usage : ``shutil.rmtree(path, onerror=onerror)``

"""
import stat
if not os.access(path, os.W_OK):
    # Is the error an access error ?
    os.chmod(path, stat.S_IWUSR)
    func(path)
else:
    raise
Alia Khouri (author) 14 years, 6 months ago  # | flag

Thanks to Kun for the bug report about Windows and suggested fix which I've included in the latest version.

Alia Khouri (author) 13 years, 2 months ago  # | flag

New features:

  • python 2.6+ and 3.+ compatible

  • added report on cumulative size of files involved in cleaning operations