ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Colorize Python source using the built-in tokenizer
Submitter: Jürgen Hermann (other recipes)
Last Updated: 2001/04/06
Version no: 1.2
Category: Programs

 

5 stars 17 vote(s)


Approved

Description:

This code is part of MoinMoin (http://moin.sourceforge.net/) and converts Python source code to HTML markup, rendering comments, keywords, operators, numeric and string literals in different colors.

It shows how to use the built-in keyword, token and tokenize modules to scan Python source code and re-emit it with no changes to its original formatting (which is the hard part).

The test code at the bottom of the module formats itself and launches a browser with the result.

Source: Text Source

"""
    MoinMoin - Python Source Parser
"""

# Imports
import cgi, string, sys, cStringIO
import keyword, token, tokenize


#############################################################################
### Python Source Parser (does Hilighting)
#############################################################################

_KEYWORD = token.NT_OFFSET + 1
_TEXT    = token.NT_OFFSET + 2

_colors = {
    token.NUMBER:       '#0080C0',
    token.OP:           '#0000C0',
    token.STRING:       '#004080',
    tokenize.COMMENT:   '#008000',
    token.NAME:         '#000000',
    token.ERRORTOKEN:   '#FF8080',
    _KEYWORD:           '#C00000',
    _TEXT:              '#000000',
}


class Parser:
    """ Send colored python source.
    """

    def __init__(self, raw, out = sys.stdout):
        """ Store the source text.
        """
        self.raw = string.strip(string.expandtabs(raw))
        self.out = out

    def format(self, formatter, form):
        """ Parse and send the colored source.
        """
        # store line offsets in self.lines
        self.lines = [0, 0]
        pos = 0
        while 1:
            pos = string.find(self.raw, '\n', pos) + 1
            if not pos: break
            self.lines.append(pos)
        self.lines.append(len(self.raw))

        # parse the source and write it
        self.pos = 0
        text = cStringIO.StringIO(self.raw)
        self.out.write('<pre><font face="Lucida,Courier New">')
        try:
            tokenize.tokenize(text.readline, self)
        except tokenize.TokenError, ex:
            msg = ex[0]
            line = ex[1][0]
            self.out.write("<h3>ERROR: %s</h3>%s\n" % (
                msg, self.raw[self.lines[line]:]))
        self.out.write('</font></pre>')

    def __call__(self, toktype, toktext, (srow,scol), (erow,ecol), line):
        """ Token handler.
        """
        if 0:
            print "type", toktype, token.tok_name[toktype], "text", toktext,
            print "start", srow,scol, "end", erow,ecol, "<br>"

        # calculate new positions
        oldpos = self.pos
        newpos = self.lines[srow] + scol
        self.pos = newpos + len(toktext)

        # handle newlines
        if toktype in [token.NEWLINE, tokenize.NL]:
            self.out.write('\n')
            return

        # send the original whitespace, if needed
        if newpos > oldpos:
            self.out.write(self.raw[oldpos:newpos])

        # skip indenting tokens
        if toktype in [token.INDENT, token.DEDENT]:
            self.pos = newpos
            return

        # map token type to a color group
        if token.LPAR <= toktype and toktype <= token.OP:
            toktype = token.OP
        elif toktype == token.NAME and keyword.iskeyword(toktext):
            toktype = _KEYWORD
        color = _colors.get(toktype, _colors[_TEXT])

        style = ''
        if toktype == token.ERRORTOKEN:
            style = ' style="border: solid 1.5pt #FF0000;"'

        # send text
        self.out.write('<font color="%s"%s>' % (color, style))
        self.out.write(cgi.escape(toktext))
        self.out.write('</font>')


if __name__ == "__main__":
    import os, sys
    print "Formatting..."

    # open own source
    source = open('python.py').read()

    # write colorized version to "python.html"
    Parser(source, open('python.html', 'wt')).format(None, None)

    # load HTML page into browser
    if os.name == "nt":
        os.system("explorer python.html")
    else:
        os.system("netscape python.html &")

Discussion:



Add comment

Number of comments: 5

Thanks, andy mckay, 2001/04/05
We are using this recipe to colorize this very cookbook, thanks. I made a slight change though so that the script uses css and span to allow easy colour manipulation.
Add comment

\, andy mckay, 2001/04/18
Doesnt handle continued lines, using the \ operator.
Add comment

\ works, at some places, Jürgen Hermann, 2001/04/19
In its original environment, the code works, see http://purl.net/wiki/python/MoinMoinColorizer. Can't say what causes it not to work in the Cookbook.
Add comment

Easy to turn this into an Apache handler - serve up colorized .py files!, Mike Brown, 2002/09/16
Some small changes make it possible to invoke this as (a) a CGI script that uses the PATH_TRANSLATED CGI environment variable to know what file to colorize; (b) a command-line tool that takes the filename from the first argument; or (c) a filter that colorizes whatever it gets from stdin. See http://skew.org/~mike/colorize.py for my version. To finish set it up as a handler in Apache, so that when you request a .py file, the file is served up as colorized HTML, you will need to save the script as colorize.cgi (not .py, lest it get confused), and add this to your .htaccess or httpd.conf:

AddHandler application/x-python .py
Action application/x-python /full/virtual/path/to/colorize.cgi
Also make sure you have the Action module enabled in your httpd.conf.
Add comment

... and also as a module!, Chris Arndt, 2005/07/04
Based on your version, I made some additional enhancements:

- make script usable as a module
- use <class> tags and style sheet instead of <style> tags
- when called as a script, add HTML header and footer

This version can be found here:

http://chrisarndt.de/en/software/python/colorize.html
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. a friendly mkdir()

4. SOLVING THE METACLASS ...

5. Povray for python

6. Changing return value ...

7. Implementation of sets ...

8. bag collection class

9. deque collection class

10. Floating Point Simulator




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.