ActiveState Code

Recipe 576704: Python code minifier


Python Minifier: Reduces the size of Python code for use on embedded platforms. Performs the following:

  1. Removes docstrings.
  2. Removes comments.
  3. Minimizes code indentation.
  4. Joins multiline pairs of parentheses, braces, and brackets (and removes extraneous whitespace within).
  5. Preserves shebangs and encoding info (e.g. "# -- coding: utf-8 --")
Python
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#       pyminifier.py
#
#       Copyright 2009 Dan McDougall <YouKnowWho@YouKnowWhat.com>
#
#       This program is free software; you can redistribute it and/or modify
#       it under the terms of the GNU General Public License as published by
#       the Free Software Foundation; Version 3 of the License
#
#       This program is distributed in the hope that it will be useful,
#       but WITHOUT ANY WARRANTY; without even the implied warranty of
#       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#       GNU General Public License for more details.
#
#       You should have received a copy of the GNU General Public License
#       along with this program; if not, the license can be downloaded here:
#
#       http://www.gnu.org/licenses/gpl.html

# Meta
__version__ = '1.1'
__license__ = "GNU General Public License (GPL) Version 3"
__version_info__ = (1, 1)
__author__ = 'Dan McDougall <YouKnowWho@YouKnowWhat.com>'

import os, sys, re
from pyparsing import QuotedString, Suppress, Keyword, Optional, Word, Literal, ZeroOrMore, alphanums, restOfLine, replaceWith, pythonStyleComment, printables

"""
Python Minifier:  Reduces the size of Python code for use on embedded platforms.

Performs the following:
    1) Removes docstrings.
    2) Removes comments.
    3) Minimizes code indentation.
    4) Joins multiline pairs of parentheses, braces, and brackets (and removes extraneous whitespace within).
    5) Preserves shebangs and encoding info (e.g. "# -- coding: utf-8 --").
"""

# Compile our regular expressions for speed
multiline_quoted_string_regex = re.compile(r'(\'\'\'|\"\"\")')
not_quoted_string_regex = re.compile(r'(\".*\'\'\'.*\"|\'.*\"\"\".*\')')
double_quoted_string_regex = re.compile(r'((?<!\\)".*?(?<!\\)")')
single_quoted_string_regex = re.compile(r"((?<!\\)'.*?(?<!\\)')")
whitespace = re.compile('\s*')
trailing_newlines = re.compile(r'\n\n')
shebang = re.compile('^#\!.*$')
encoding = re.compile(".*coding[:=]\s*([-\w.]+)")
comment = re.compile("(?!(\'|\")*#.*(\'|\"))\s*#.*")
blank_lines = re.compile("\n\s*\n")
#parens = re.compile("\((?P<parens>[^()]|\(\))*\)", re.MULTILINE|re.DOTALL)
multiline_indicator = re.compile('\\\\(\s*#.*)?\n') # Also removes trailing comments: "test = 'blah \ # comment here"
# Operators (for future use)
#commas = re.compile("(?!\'.*\')\s*\,\s*\n*\s*") # To be replaced with ","
#plus_signs = re.compile("(?!\'.*\')\s*\+\s*\n*\s*") # To be replaced with "+"
#minus_signs = re.compile("(?!\'.*\')\s*\-\s*\n*\s*") # To be replaced with "-"
#multiply_signs = re.compile("(?!\'.*\')\s*\*\s*\n*\s*") # To be replaced with "*"
#divide_signs = re.compile("(?!\'.*\')\s*\/\s*\n*\s*") # To be replaced with "/"
#less_signs = re.compile("(?!\'.*\')\s*\<\s*\n*\s*") # To be replaced with "<"
#greater_signs = re.compile("(?!\'.*\')\s*\>\s*\n*\s*") # To be replaced with ">"
#equal_signs = re.compile("(?!\'.*\')\s*\s*\=\s*\n*\s*") # To be replaced with "="
#equals_signs = re.compile("(?!\'.*\')\s*\=\=\s*\n*\s*") # To be replaced with "=="
#not_equal_signs = re.compile("(?!\'.*\')\s*\!\=\s*\n*\s*") # To be replaced with "!="
#add_assign = re.compile("(?!\'.*\')\s*\+\=\s*\n*\s*") # To be replaced with "+="
#sub_assign = re.compile("(?!\'.*\')\s*\-\=\s*\n*\s*") # To be replaced with "-="
#modulus_assign = re.compile("(?!\'.*\')\s*\%\=\s*\n*\s*") # To be replaced with "%="
#multiply_assign = re.compile("(?!\'.*\')\s*\*\=\s*\n*\s*") # To be replaced with "*="
#powers_assign = re.compile("(?!\'.*\')\s*\*\*\=\s*\n*\s*") # To be replaced with "**="
#divide_assign = re.compile("(?!\'.*\')\s*\/\=\s*\n*\s*") # To be replaced with "/="
#truncate_divide_assign = re.compile("(?!\'.*\')\s*\/\/\=\s*\n*\s*") # To be replaced with "*//="
#truncated_divide_signs = re.compile("(?!\'.*\')\s*\/\/\s*\n*\s*") # To be replaced with "//"
#powers_signs = re.compile("(?!\'.*\')\s*\*\*\s*\n*\s*") # To be replaced with "**"
#left_shift_signs = re.compile("(?!\'.*\')\s*\<\<\s*\n*\s*") # To be replaced with "<<"
#right_shift_signs = re.compile("(?!\'.*\')\s*\*>\>\s*\n*\s*") # To be replaced with ">>"
#modulos_signs = re.compile("(?!\'.*\')\s*\%\s*\n*\s*") # To be replaced with "%"
#and_signs = re.compile("(?!\'.*\')\s*\&\s*\n*\s*") # To be replaced with "&"
#or_signs = re.compile("(?!\'.*\')\s*\|\s*\n*\s*") # To be replaced with "|"
#xor_signs = re.compile("(?!\'.*\')\s*\^\s*\n*\s*") # To be replaced with "^"
#negation_signs = re.compile("(?!\'.*\')\s*\~\s*\n*\s*") # To be replaced with "~"

def substitute_matches(matchlist, opener_regex, closer_regex, opener_sub, closer_sub):
    """Replaces 'opener' and 'closer' in 'matchlist' with 'opener_sub' and 'closer_sub'"""
    preoutput = ""
    for item in matchlist:
        if item:
            if item[0] == '"':
                # Sub out all the matching pairs with something so they don't match later on (we'll change them back at the end)
                item = opener_regex.sub('%s' % opener_sub, item)
                item = closer_regex.sub('%s' % closer_sub, item)
                preoutput += item
            else:
                preoutput += item
    line = "".join(preoutput)
    return line

def join_multiline_pairs(text, pair="()"):
    """Finds and removes newlines in multiline matching pairs of characters in 'text'.
    For example, "(.*\n.*), {.*\n.*}, or [.*\n.*]").
    By default it joins parens () but it will join any two characters it is passed in the 'pair' variable.
    """
    # Readability variables
    opener = pair[0]
    closer = pair[1]

    # Tracking variables
    inside_pair = False
    inside_quotes = False
    inside_double_quotes = False
    inside_single_quotes = False
    quoted_string = False
    openers = 0
    closers = 0
    linecount = 0

    # Static variables
    opener_sub = '###OPENER###'
    closer_sub = '###CLOSER###'

    # Regular expressions
    opener_regex = re.compile('\%s' % opener)
    closer_regex = re.compile('\%s' % closer)
    opener_sub_regex = re.compile('(?!(\'|\"))%s(?!(\'|\"))' % opener_sub)
    closer_sub_regex = re.compile('(?!(\'|\"))%s(?!(\'|\"))' % closer_sub)

    output = ""

    for line in text.split('\n'):
        escaped = False
        multline_match = multiline_quoted_string_regex.search(line)
        not_quoted_string_match = not_quoted_string_regex.search(line)
        if multline_match and not not_quoted_string_match and not quoted_string:
            output += line + '\n'
            quoted_string = True
        elif quoted_string and multiline_quoted_string_regex.search(line) and not quoted_string:
            output += line + '\n'
            quoted_string = False
        elif opener_regex.search(line) or closer_regex.search(line) or inside_pair:
            for character in line:
                if character == opener:
                    if not escaped:
                        openers += 1
                        inside_pair = True
                        output += character
                    else:
                        escaped = False
                        output += character
                elif character == closer:
                    if not escaped:
                        if openers == (closers + 1) and openers != 0:
                            closers = 0
                            openers = 0
                            inside_pair = False
                            output += character
                        else:
                            closers += 1
                            output += character
                    else:
                        escaped = False
                        output += character
                elif character == '\\':
                    if escaped:
                        escaped = False
                        output += character
                    else:
                        escaped = True
                        output += character
                elif character == '"' and escaped:
                    output += character
                    escaped = False
                elif character == "'" and escaped:
                    output += character
                    escaped = False
                elif character == '"' and inside_quotes:
                    if inside_single_quotes:
                        output += character
                    else:
                        inside_quotes = False
                        inside_double_quotes = False
                        output += character
                elif character == "'" and inside_quotes:
                    if inside_double_quotes:
                        output += character
                    else:
                        inside_quotes = False
                        inside_single_quotes = False
                        output += character
                elif character == '"' and not inside_quotes:
                    inside_quotes = True
                    inside_double_quotes = True
                    output += character
                elif character == "'" and not inside_quotes:
                    inside_quotes = True
                    inside_single_quotes = True
                    output += character
                elif character == ' ' and inside_pair and not inside_quotes:
                    pass
                else:
                    if escaped:
                        escaped = False
                    output += character
            if inside_pair == False:
                output += '\n'
        else:
            output += line + '\n'

    # Clean up
    output = opener_sub_regex.sub('%s' % opener, output)
    output = closer_sub_regex.sub('%s' % closer, output)
    output = trailing_newlines.sub('\n', output)

    return output

def dedent(source):
    """Minimizes indentation to save precious bytes"""
    indentation_list = []
    output = ""
    # First find all the levels of indentation
    for line in source.split('\n'):
        indentation_level = len(line.rstrip()) - len(line.strip())
        if indentation_level not in indentation_list:
            indentation_list.append(indentation_level)
    # Now we can reduce each line's indentation to the minimal value
    for line in source.split('\n'):
        indentation_level = len(line.rstrip()) - len(line.strip())
        for i,v in enumerate(indentation_list):
            if indentation_level == v:
                output += " " * i + line.lstrip() + "\n"
    return output

    #def reduce_operators(source):
    #"""Removes spaces and newlines between operators"""
    source = multiline_indicator.sub('', source)

    # The following is meant to remove space between operators but it currently has issues (working on it).
    #source = commas.sub(',', source)
    #source = plus_signs.sub('+', source)
    #source = minus_signs.sub('-', source)
    #source = multiply_signs.sub('*', source)
    #source = divide_signs.sub('/', source)
    #source = less_signs.sub('<', source)
    #source = greater_signs.sub('>', source)
    #source = equal_signs.sub('=', source)
    #source = equals_signs.sub('==', source)
    #source = not_equal_signs.sub('<!=', source)
    #source = add_assign.sub('+=', source)
    #source = sub_assign.sub('-=', source)
    #source = modulus_assign.sub('%=', source)
    #source = multiply_assign.sub('*=', source)
    #source = powers_assign.sub('**=', source)
    #source = divide_assign.sub('/=', source)
    #source = truncate_divide_assign.sub('//=', source)
    #source = truncated_divide_signs.sub('//', source)
    #source = powers_signs.sub('**', source)
    #source = left_shift_signs.sub('<<', source)
    #source = right_shift_signs.sub('>>', source)
    #source = modulos_signs.sub('%', source)
    #source = and_signs.sub('&', source)
    #source = or_signs.sub('|', source)
    #source = xor_signs.sub('^', source)
    #source = negation_signs.sub('~', source)
    #return source

def empty_method():
    """Just a test method.  This should be replaced with 'def empty_method: pass'"""

def fix_empty_methods(source):
    """Appends 'pass' to empty methods/functions (i.e. where there was nothing but a docstring before we removed docstrings =)"""
    def_indentation_level = 0
    output = ""
    just_matched = False
    previous_line = None
    method = re.compile(r'^\s*def\s*.*\(.*\):.*$')
    for line in source.split('\n'):
        if len(line.strip()) > 0: # Don't look at blank lines
            if just_matched == True:
                this_indentation_level = len(line.rstrip()) - len(line.strip())
                if def_indentation_level == this_indentation_level:
                    # This method is empty, insert a 'pass' statement
                    output += "%s pass\n%s\n" % (previous_line, line)
                else:
                    output += "%s\n%s\n" % (previous_line, line)
                just_matched = False
            elif method.match(line):
                def_indentation_level = len(line) - len(line.strip())
                just_matched = True
                previous_line = line
            else:
                output += "%s\n" % line
        else:
            output += "\n"
    return output

def remove_docstrings(source):
    """Removes docstrings from the source"""
    method = (
        Suppress(Keyword("def") +
        Word(alphanums+"_") +
        '(' + ZeroOrMore(Word(alphanums+"_")) + ')' + ":")
    )
    doc = Keyword("__doc__")
    
    # This removes multiline docstrings
    string = ( 
        (QuotedString(quoteChar='\"\"\"', escChar='\\', multiline=True) | \
        QuotedString(quoteChar="\'\'\'", escChar='\\', multiline=True))
    )
    multiLineDocstring = (Optional(doc + Literal('=') + Optional('\\')) + string)
    multiLineDocstring.setParseAction(replaceWith(""))
    source = multiLineDocstring.transformString(source)

    # This removes single line docstrings
    singleLineDocstring = (
        Suppress(method) +
        (QuotedString(quoteChar='"', escChar='\\', multiline=False) | \
        QuotedString(quoteChar="'", escChar='\\', multiline=False))
    )
    singleLineDocstring.setParseAction(replaceWith(""))

    return singleLineDocstring.transformString(source)

def minify(source):
    """Remove all docstrings, comments, blank lines, and minimize code indentation from 'source' (string)."""
    preserved_shebang = None
    preserved_encoding = None

    source = remove_docstrings(source)

    # This loop is for things that must be preserved precisely
    for line in source.split('\n')[0:2]:
        # Save the first comment line if it starts with a shebang (#!) so we can re-add it later
        if shebang.match(line):
            preserved_shebang = line

        # Save the encoding string so we can re-add it later
        if encoding.match(line):
            preserved_encoding = line

    # Remove comments
    source = comment.sub('', source)

    # TODO: This currently isn't working for some reason
    #       probably due to escape character detection in join_multiline_pairs()
    # Remove multilines (e.g. lines that end with '\' followed by a newline)
    source = multiline_indicator.sub('', source)

    # Join multiline pairs of parens, brackets, and braces
    source = join_multiline_pairs(source)
    #source = join_multiline_pairs(source, '[]')
    #source = join_multiline_pairs(source, '{}')

    # Re-add preseved items
    if preserved_encoding:
        source = preserved_encoding + "\n" + source
    if preserved_shebang:
        source = preserved_shebang + "\n" + source

    # Minimize indentation
    source = dedent(source)

    # Remove empty (i.e. single line) methods/functions
    source = fix_empty_methods(source)

    # Remove blank lines
    source = blank_lines.sub('\n', source)

    return source

def main():
    if len(sys.argv) > 1:
        source = open(sys.argv[1]).read()
        print minify(source)
    else:
        print "Usage: pyminifier.py <python source file>"

if __name__ == "__main__":
    main()

Discussion

I wrote this so I could minimize the size of python code being run on embedded platforms (e.g. OpenWRT). minified + zipped modules can save a lot of space when applied to a large number of files. Here's an example of the ouput minifying itself (Note: For whatever reason this website doesn't display the indentation properly):

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__version__ = '1.1'
__license__ = "GNU General Public License (GPL) Version 3"
__version_info__ = (1,1)
__author__ = 'Dan McDougall <YouKnowWho@YouKnowWhat.com>'
import os, sys, re
from pyparsing import QuotedString, Suppress, Keyword, Optional, Word, Literal, ZeroOrMore, alphanums, restOfLine, replaceWith, pythonStyleComment, printables
multiline_quoted_string_regex = re.compile(r'(\'\'\'|\"\"\")')
not_quoted_string_regex = re.compile(r'(\".*\'\'\'.*\"|\'.*\"\"\".*\')')
double_quoted_string_regex = re.compile(r'((?<!\\)".*?(?<!\\)")')
single_quoted_string_regex = re.compile(r"((?<!\\)'.*?(?<!\\)')")
whitespace = re.compile('\s*')
trailing_newlines = re.compile(r'\n\n')
shebang = re.compile('^#\!.*$')
encoding = re.compile(".*coding[:=]\s*([-\w.]+)")
comment = re.compile("(?!(\'|\")*#.*(\'|\"))\s*#.*")
blank_lines = re.compile("\n\s*\n")
multiline_indicator = re.compile('\\\\(\s*#.*)?\n')
def substitute_matches(matchlist,opener_regex,closer_regex,opener_sub,closer_sub):
 preoutput = ""
 for item in matchlist:
  if item:
   if item[0] == '"':
    item = opener_regex.sub('%s'%opener_sub,item)
    item = closer_regex.sub('%s'%closer_sub,item)
    preoutput += item
   else:
    preoutput += item
 line = "".join(preoutput)
 return line
def join_multiline_pairs(text,pair="()"):
 opener = pair[0]
 closer = pair[1]
 inside_pair = False
 inside_quotes = False
 inside_double_quotes = False
 inside_single_quotes = False
 quoted_string = False
 openers = 0
 closers = 0
 linecount = 0
 opener_sub = '###OPENER###'
 closer_sub = '###CLOSER###'
 opener_regex = re.compile('\%s'%opener)
 closer_regex = re.compile('\%s'%closer)
 opener_sub_regex = re.compile('(?!(\'|\"))%s(?!(\'|\"))'%opener_sub)
 closer_sub_regex = re.compile('(?!(\'|\"))%s(?!(\'|\"))'%closer_sub)
 output = ""
 for line in text.split('\n'):
  escaped = False
  multline_match = multiline_quoted_string_regex.search(line)
  not_quoted_string_match = not_quoted_string_regex.search(line)
  if multline_match and not not_quoted_string_match and not quoted_string:
   output += line + '\n'
   quoted_string = True
  elif quoted_string and multiline_quoted_string_regex.search(line) and not quoted_string:
   output += line + '\n'
   quoted_string = False
  elif opener_regex.search(line) or closer_regex.search(line) or inside_pair:
   for character in line:
    if character == opener:
     if not escaped:
      openers += 1
      inside_pair = True
      output += character
     else:
      escaped = False
      output += character
    elif character == closer:
     if not escaped:
      if openers == (closers+1) and openers != 0:
       closers = 0
       openers = 0
       inside_pair = False
       output += character
      else:
       closers += 1
       output += character
     else:
      escaped = False
      output += character
    elif character == '\\':
     if escaped:
      escaped = False
      output += character
     else:
      escaped = True
      output += character
    elif character == '"' and escaped:
     output += character
     escaped = False
    elif character == "'" and escaped:
     output += character
     escaped = False
    elif character == '"' and inside_quotes:
     if inside_single_quotes:
      output += character
     else:
      inside_quotes = False
      inside_double_quotes = False
      output += character
    elif character == "'" and inside_quotes:
     if inside_double_quotes:
      output += character
     else:
      inside_quotes = False
      inside_single_quotes = False
      output += character
    elif character == '"' and not inside_quotes:
     inside_quotes = True
     inside_double_quotes = True
     output += character
    elif character == "'" and not inside_quotes:
     inside_quotes = True
     inside_single_quotes = True
     output += character
    elif character == ' ' and inside_pair and not inside_quotes:
     pass
    else:
     if escaped:
      escaped = False
     output += character
   if inside_pair == False:
    output += '\n'
  else:
   output += line + '\n'
 output = opener_sub_regex.sub('%s'%opener,output)
 output = closer_sub_regex.sub('%s'%closer,output)
 output = trailing_newlines.sub('\n',output)
 return output
def dedent(source):
 indentation_list = []
 output = ""
 for line in source.split('\n'):
  indentation_level = len(line.rstrip()) - len(line.strip())
  if indentation_level not in indentation_list:
   indentation_list.append(indentation_level)
 for line in source.split('\n'):
  indentation_level = len(line.rstrip()) - len(line.strip())
  for i,v in enumerate(indentation_list):
   if indentation_level == v:
    output += " " * i + line.lstrip() + "\n"
 return output
 source = multiline_indicator.sub('',source)
def empty_method(): pass
def fix_empty_methods(source):
 def_indentation_level = 0
 output = ""
 just_matched = False
 previous_line = None
 method = re.compile(r'^\s*def\s*.*\(.*\):.*$')
 for line in source.split('\n'):
  if len(line.strip()) > 0:
   if just_matched == True:
    this_indentation_level = len(line.rstrip()) - len(line.strip())
    if def_indentation_level == this_indentation_level:
     output += "%s pass\n%s\n" % (previous_line,line)
    else:
     output += "%s\n%s\n" % (previous_line,line)
    just_matched = False
   elif method.match(line):
    def_indentation_level = len(line) - len(line.strip())
    just_matched = True
    previous_line = line
   else:
    output += "%s\n" % line
  else:
   output += "\n"
 return output
def remove_docstrings(source):
 method = (Suppress(Keyword("def")+Word(alphanums+"_")+'('+ZeroOrMore(Word(alphanums+"_"))+')'+":"))
 doc = Keyword("__doc__")
 string = ((QuotedString(quoteChar='\"\"\"',escChar='\\',multiline=True)|QuotedString(quoteChar="\'\'\'",escChar='\\',multiline=True)))
 multiLineDocstring = (Optional(doc+Literal('=')+Optional('\\'))+string)
 multiLineDocstring.setParseAction(replaceWith(""))
 source = multiLineDocstring.transformString(source)
 singleLineDocstring = (Suppress(method)+(QuotedString(quoteChar='"',escChar='\\',multiline=False)|QuotedString(quoteChar="'",escChar='\\',multiline=False)))
 singleLineDocstring.setParseAction(replaceWith(""))
 return singleLineDocstring.transformString(source)
def minify(source):
 preserved_shebang = None
 preserved_encoding = None
 source = remove_docstrings(source)
 for line in source.split('\n')[0:2]:
  if shebang.match(line):
   preserved_shebang = line
  if encoding.match(line):
   preserved_encoding = line
 source = comment.sub('',source)
 source = multiline_indicator.sub('',source)
 source = join_multiline_pairs(source)
 if preserved_encoding:
  source = preserved_encoding + "\n" + source
 if preserved_shebang:
  source = preserved_shebang + "\n" + source
 source = dedent(source)
 source = fix_empty_methods(source)
 source = blank_lines.sub('\n',source)
 return source
def main():
 if len(sys.argv) > 1:
  source = open(sys.argv[1]).read()
  print minify(source)
 else:
  print "Usage: pyminifier.py <python source file>"
if __name__ == "__main__":
 main()

Comments

  1. 1. At 7:42 p.m. on 31 mar 2009, Daniel Lepage said:

    Discarding all comments removes the encoding. If the target script uses e.g. utf-8 with non-ascii characters the output will generate a syntax error.

  2. 2. At 6:49 a.m. on 1 apr 2009, Dan McDougall (the author) said:

    Good catch. I'll update the script to fix that (shouldn't be too hard).

    Note: It also messes up quoted # signs like this:

    Comment = "#"
    

    ...results in Comment = ". Still trying to figure out how to fix that.

  3. 3. At 8:51 a.m. on 1 apr 2009, Akira Fora said:

    Good shot. You could save a few more bytes by : -transforming multiline instructions into single line instrustions. -removing spaces around operators. e.g. singleLineDocstring = ( method + (QuotedString(quoteChar='"', escChar='\', multiline=False) | \ QuotedString(quoteChar="'", escChar='\', multiline=False)) ) can be transformed into: singleLineDocstring=(method+(QuotedString(quoteChar='"',escChar='\',multiline=False)|QuotedString(quoteChar="'",escChar='\',multiline=False)))

  4. 4. At 9:26 a.m. on 1 apr 2009, Akira Fora said:

    You can achieve further size reduction by obfuscating the code.
    I didn't find any solid estimation of size reduction brought by obfuscation of Python code but Yahoo dev networks claims that, applied to CSS and Javascrip:
    "In a survey of ten top U.S. web sites, minification achieved a 21% size reduction versus 25% for obfuscation." (http://developer.yahoo.com/performance/rules.html)
    I guess the advantage of obfuscation over minification is even more significant applied to Python than applied to CSS and Javascrip, due to the respective syntax of these languages.
    Though, obfuscors are more complex than minifiers, and obfuscated code is a pain if you have to debug it.

  5. 5. At 9:40 a.m. on 1 apr 2009, Dan McDougall (the author) said:

    Yeah, I thought about adding some obfuscating stuff to this but I changed my mind because it makes debugging nary impossible and when you're working on embedded platforms you often can't debug your code anywhere else but the device itself.

    Also, I had originally planned to reduce spaces between operators but I never got around to it. It is in the TODO list =)

  6. 6. At 6:57 p.m. on 2 apr 2009, Dan McDougall (the author) said:

    UPDATE: It took FOREVER (and a lot of code) to work it all out but I've finally got this joining multiline pairs of parens, braces, and brackets (and removing unnecessary whitespace inside them). So something like this:

    myvar = ('test',
        'test2',
    )
    

    Becomes...

    myvar = ('test','test2',)
    

    I googled and googled but I never did find another example of code that does that. Not even in another language. The hardest part was getting it to ignore things in triple double/single quotes and dealing with escaped characters (especially code that literally has something like "'\'").

  7. 7. At 7 p.m. on 2 apr 2009, Dan McDougall (the author) said:

    Almost forgot: I've also got it preserving shebangs (#!/usr/bin/env python) and encoding strings now (per Daniel Lepage's comment).

  8. 8. At 5:08 p.m. on 14 apr 2009, Dan McDougall (the author) said:

    Update (rev 10): Fixed some bugs where it wasn't working properly in certain (odd) situations. It also joins multi-lines (i.e. that end in '\') properly again.

  9. 9. At 8:09 a.m. on 14 jul 2009, jcaballero.hep said:

    It doesn't work when the original code includes lines like the following one:

    print "this is an example # of line which fails"

    pyminifier interprets everything after the char # as a comment and, therefore, the rest of the code is not well processed.

  10. 10. At 1:40 a.m. on 27 aug 2009, robert marshall said:

    If I give it a line with redundant brackets:

    if (not result):
    

    this gets converted to

    if (notresult):
    

    i.e. with no space between the 'not' and the variable

  11. 11. At 12:14 p.m. on 15 nov 2009, etatS evictA said:

    The problem mentioned by Robert Marshall can be fixed by changing

                elif character == ' ' and inside_pair and not inside_quotes:
                    pass
    

    to

                elif character == ' ' and inside_pair and not inside_quotes:
                    if output[-1] != ' ':
                        output += ' '
    

    or even better:

                elif character == ' ' and inside_pair and not inside_quotes:
                    if not output[-1] in [' ', opener]:
                        output += ' '
    

    jcaballero.hep mentions a problem that I haven't been able to fix and is a showstopper for me. It is some sort of error in the comment RegExp. I will be very glad if anyone would solve it.

    Also, I didn't try this, but there will be trouble on """ ''' asd ''' """

    (which could very reasonable appear e.g. in doctests)

Sign in to comment