ActiveState Code

Recipe 52298: Colorize Python source using the built-in tokenizer


This code is part of MoinMoin (http://moin.sourceforge.net/) and converts Python source code to HTML markup, rendering comments, keywords, operators, numeric and string literals in different colors.

It shows how to use the built-in keyword, token and tokenize modules to scan Python source code and re-emit it with no changes to its original formatting (which is the hard part).

The test code at the bottom of the module formats itself and launches a browser with the result.

Python
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
"""
    MoinMoin - Python Source Parser
"""

# Imports
import cgi, string, sys, cStringIO
import keyword, token, tokenize


#############################################################################
### Python Source Parser (does Hilighting)
#############################################################################

_KEYWORD = token.NT_OFFSET + 1
_TEXT    = token.NT_OFFSET + 2

_colors = {
    token.NUMBER:       '#0080C0',
    token.OP:           '#0000C0',
    token.STRING:       '#004080',
    tokenize.COMMENT:   '#008000',
    token.NAME:         '#000000',
    token.ERRORTOKEN:   '#FF8080',
    _KEYWORD:           '#C00000',
    _TEXT:              '#000000',
}


class Parser:
    """ Send colored python source.
    """

    def __init__(self, raw, out = sys.stdout):
        """ Store the source text.
        """
        self.raw = string.strip(string.expandtabs(raw))
        self.out = out

    def format(self, formatter, form):
        """ Parse and send the colored source.
        """
        # store line offsets in self.lines
        self.lines = [0, 0]
        pos = 0
        while 1:
            pos = string.find(self.raw, '\n', pos) + 1
            if not pos: break
            self.lines.append(pos)
        self.lines.append(len(self.raw))

        # parse the source and write it
        self.pos = 0
        text = cStringIO.StringIO(self.raw)
        self.out.write('<pre><font face="Lucida,Courier New">')
        try:
            tokenize.tokenize(text.readline, self)
        except tokenize.TokenError, ex:
            msg = ex[0]
            line = ex[1][0]
            self.out.write("<h3>ERROR: %s</h3>%s\n" % (
                msg, self.raw[self.lines[line]:]))
        self.out.write('</font></pre>')

    def __call__(self, toktype, toktext, (srow,scol), (erow,ecol), line):
        """ Token handler.
        """
        if 0:
            print "type", toktype, token.tok_name[toktype], "text", toktext,
            print "start", srow,scol, "end", erow,ecol, "<br>"

        # calculate new positions
        oldpos = self.pos
        newpos = self.lines[srow] + scol
        self.pos = newpos + len(toktext)

        # handle newlines
        if toktype in [token.NEWLINE, tokenize.NL]:
            self.out.write('\n')
            return

        # send the original whitespace, if needed
        if newpos > oldpos:
            self.out.write(self.raw[oldpos:newpos])

        # skip indenting tokens
        if toktype in [token.INDENT, token.DEDENT]:
            self.pos = newpos
            return

        # map token type to a color group
        if token.LPAR <= toktype and toktype <= token.OP:
            toktype = token.OP
        elif toktype == token.NAME and keyword.iskeyword(toktext):
            toktype = _KEYWORD
        color = _colors.get(toktype, _colors[_TEXT])

        style = ''
        if toktype == token.ERRORTOKEN:
            style = ' style="border: solid 1.5pt #FF0000;"'

        # send text
        self.out.write('<font color="%s"%s>' % (color, style))
        self.out.write(cgi.escape(toktext))
        self.out.write('</font>')


if __name__ == "__main__":
    import os, sys
    print "Formatting..."

    # open own source
    source = open('python.py').read()

    # write colorized version to "python.html"
    Parser(source, open('python.html', 'wt')).format(None, None)

    # load HTML page into browser
    if os.name == "nt":
        os.system("explorer python.html")
    else:
        os.system("netscape python.html &")

Comments

  1. 1. At 7:44 a.m. on 5 apr 2001, andy mckay said:

    Thanks. We are using this recipe to colorize this very cookbook, thanks. I made a slight change though so that the script uses css and span to allow easy colour manipulation.

  2. 2. At 2:48 p.m. on 18 apr 2001, andy mckay said:

    . Doesnt handle continued lines, using the \ operator.

  3. 3. At 3:11 p.m. on 19 apr 2001, Jürgen Hermann (the author) said:

    \ works, at some places. In its original environment, the code works, see http://purl.net/wiki/python/MoinMoinColorizer. Can't say what causes it not to work in the Cookbook.

  4. 4. At 10:32 p.m. on 16 sep 2002, Mike Brown said:

    Easy to turn this into an Apache handler - serve up colorized .py files! Some small changes make it possible to invoke this as (a) a CGI script that uses the PATH_TRANSLATED CGI environment variable to know what file to colorize; (b) a command-line tool that takes the filename from the first argument; or (c) a filter that colorizes whatever it gets from stdin. See http://skew.org/~mike/colorize.py for my version.

    To finish set it up as a handler in Apache, so that when you request a .py file, the file is served up as colorized HTML, you will need to save the script as colorize.cgi (not .py, lest it get confused), and add this to your .htaccess or httpd.conf:

    AddHandler application/x-python .py
    Action application/x-python /full/virtual/path/to/colorize.cgi
    

    Also make sure you have the Action module enabled in your httpd.conf.

  5. 5. At 12:50 p.m. on 4 jul 2005, Chris Arndt said:

    ... and also as a module! Based on your version, I made some additional enhancements:

    • make script usable as a module

    • use <class> tags and style sheet instead of <style> tags

    • when called as a script, add HTML header and footer

    This version can be found here:

    http://chrisarndt.de/en/software/python/colorize.html

Sign in to comment