Welcome, guest | Sign In | My Account | Store | Cart

The HTMLTags module defines a class for each valid HTML tag, written in uppercase letters. To create a piece of HTML, the general syntax is :

t = TAG(innerHTML, key1=val1,key2=val2,...)

so that "print t" results in :

<TAG key1="val1" key2="val2" ...>innerHTML</TAG>

For instance :

print A('bar', href="foo") ==> <A href="foo">bar</A>
Python, 216 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
"""Classes to generate HTML in Python

The HTMLTags module defines a class for all the valid HTML tags, written in
uppercase letters. To create a piece of HTML, the general syntax is :
    t = TAG(inner_HTML, key1=val1,key2=val2,...)

so that "print t" results in :
    <TAG key1="val1" key2="val2" ...>inner_HTML</TAG>

For instance :
    print A('bar', href="foo") ==> <A href="foo">bar</A>

To generate HTML attributes without value, give them the value True :
    print OPTION('foo',SELECTED=True,value=5) ==> 
            <OPTION value="5" SELECTED>

The inner_HTML argument can be an instance of an HTML class, so that 
you can nest tags, like this :
    print B(I('foo')) ==> <B><I>foo</I></B>

TAG instances support addition :
    print B('bar')+INPUT(name="bar") ==> <B>bar</B><INPUT name="bar">

and repetition :
    print TH('&nbsp')*3 ==> <TD>&nbsp;</TD><TD>&nbsp;</TD><TD>&nbsp;</TD>

For complex expressions, a tag can be nested in another using the operator <= 
Considering the HTML document as a tree, this means "add child" :

    form = FORM(action="foo")
    form <= INPUT(name="bar")
    form <= INPUT(Type="submit",value="Ok")

If you have a list (or any iterable) of instances, you can't concatenate the 
items with sum(instance_list) because sum takes only numbers as arguments. So 
there is a function called Sum() which will do the same :

    Sum( TR(TD(i)+TD(i*i)) for i in range(100) )

generates the rows of a table showing the squares of integers from 0 to 99

A simple document can be produced by :
    print HTML( HEAD(TITLE('Test document')) +
        BODY(H1('This is a test document')+
             'First line'+BR()+
             'Second line'))

This will be rendered as :
    <HTML>
    <HEAD>
    <TITLE>Test document</TITLE>
    </HEAD>
    <BODY>
    <H1>This is a test document</H1>
    First line
    <BR>
    Second line
    </BODY>
    </HTML>

If the document is more complex it is more readable to create the elements 
first, then to print the whole result in one instruction. For example :

head = HEAD()
head <= TITLE('Record collection')
head <= LINK(rel="Stylesheet",href="doc.css")

title = H1('My record collection')
table = TABLE()
table <= TR(TH('Title')+TH('Artist'))
for rec in records:
    row = TR()
    # note the attribute key Class with leading uppercase 
    # because "class" is a Python keyword
    row <= TD(rec.title,Class="title")+TD(rec.artist,Class="artist")
    table <= row

print HTML(head+BODY(title+table))
"""

import cStringIO

class TAG:
    """Generic class for tags"""
    def __init__(self, inner_HTML="", **attrs):
        self.tag = self.__class__.__name__
        self.inner_HTML = inner_HTML
        self.attrs = attrs
        self.children = []
        self.brothers = []
    
    def __str__(self):
        res=cStringIO.StringIO()
        w=res.write
        if self.tag != "TEXT":
            w("<%s" %self.tag)
            # attributes which will produce arg = "val"
            attr1 = [ k for k in self.attrs 
                if not isinstance(self.attrs[k],bool) ]
            w("".join([' %s="%s"' 
                %(k.replace('_','-'),self.attrs[k]) for k in attr1]))
            # attributes with no argument
            # if value is False, don't generate anything
            attr2 = [ k for k in self.attrs if self.attrs[k] is True ]
            w("".join([' %s' %k for k in attr2]))
            w(">")
        if self.tag in ONE_LINE:
            w('\n')
        w(str(self.inner_HTML))
        for child in self.children:
            w(str(child))
        if self.tag in CLOSING_TAGS:
            w("</%s>" %self.tag)
        if self.tag in LINE_BREAK_AFTER:
            w('\n')
        if hasattr(self,"brothers"):
            for brother in self.brothers:
                w(str(brother))
        return res.getvalue()

    def __le__(self,other):
        """Add a child"""
        if isinstance(other,str):
            other = TEXT(other)
        self.children.append(other)
        other.parent = self
        return self

    def __add__(self,other):
        """Return a new instance : concatenation of self and another tag"""
        res = TAG()
        res.tag = self.tag
        res.inner_HTML = self.inner_HTML
        res.attrs = self.attrs
        res.children = self.children
        res.brothers = self.brothers + [other]
        return res

    def __radd__(self,other):
        """Used to add a tag to a string"""
        if isinstance(other,str):
            return TEXT(other)+self
        else:
            raise ValueError,"Can't concatenate %s and instance" %other

    def __mul__(self,n):
        """Replicate self n times, with tag first : TAG * n"""
        res = TAG()
        res.tag = self.tag
        res.inner_HTML = self.inner_HTML
        res.attrs = self.attrs
        for i in range(n-1):
            res += self
        return res

    def __rmul__(self,n):
        """Replicate self n times, with n first : n * TAG"""
        return self*n

# list of tags, from the HTML 4.01 specification

CLOSING_TAGS =  ['A', 'ABBR', 'ACRONYM', 'ADDRESS', 'APPLET',
            'B', 'BDO', 'BIG', 'BLOCKQUOTE', 'BUTTON',
            'CAPTION', 'CENTER', 'CITE', 'CODE',
            'DEL', 'DFN', 'DIR', 'DIV', 'DL',
            'EM', 'FIELDSET', 'FONT', 'FORM', 'FRAMESET',
            'H1', 'H2', 'H3', 'H4', 'H5', 'H6',
            'I', 'IFRAME', 'INS', 'KBD', 'LABEL', 'LEGEND',
            'MAP', 'MENU', 'NOFRAMES', 'NOSCRIPT', 'OBJECT',
            'OL', 'OPTGROUP', 'PRE', 'Q', 'S', 'SAMP',
            'SCRIPT', 'SELECT', 'SMALL', 'SPAN', 'STRIKE',
            'STRONG', 'STYLE', 'SUB', 'SUP', 'TABLE',
            'TEXTAREA', 'TITLE', 'TT', 'U', 'UL',
            'VAR', 'BODY', 'COLGROUP', 'DD', 'DT', 'HEAD',
            'HTML', 'LI', 'P', 'TBODY','OPTION', 
            'TD', 'TFOOT', 'TH', 'THEAD', 'TR']

NON_CLOSING_TAGS = ['AREA', 'BASE', 'BASEFONT', 'BR', 'COL', 'FRAME',
            'HR', 'IMG', 'INPUT', 'ISINDEX', 'LINK',
            'META', 'PARAM']

# create the classes
for tag in CLOSING_TAGS + NON_CLOSING_TAGS + ['TEXT']:
    exec("class %s(TAG): pass" %tag)
    
def Sum(iterable):
    """Return the concatenation of the instances in the iterable
    Can't use the built-in sum() on non-integers"""
    it = [ item for item in iterable ]
    if it:
        return reduce(lambda x,y:x+y, it)
    else:
        return ''

# whitespace-insensitive tags, determines pretty-print rendering
LINE_BREAK_AFTER = NON_CLOSING_TAGS + ['HTML','HEAD','BODY',
    'FRAMESET','FRAME',
    'TITLE','SCRIPT',
    'TABLE','TR','TD','TH','SELECT','OPTION',
    'FORM',
    'H1', 'H2', 'H3', 'H4', 'H5', 'H6',
    ]
# tags whose opening tag should be alone in its line
ONE_LINE = ['HTML','HEAD','BODY',
    'FRAMESET'
    'SCRIPT',
    'TABLE','TR','TD','TH','SELECT','OPTION',
    'FORM',
    ]

if __name__ == '__main__':
    head = HEAD(TITLE('Test document'))
    body = BODY()
    body <= H1('This is a test document')
    body <= 'First line' + BR() + 'Second line'
    print HTML(head + body)

This module makes it easier to produce HTML than writing the raw HTML code in strings. Since opening and closing tags are generated, the resulting HTML should be clean, with no risk of forgetting to close a tag or misspelling a tag

10 comments

Kjell Magne Fauske 19 years, 2 months ago  # | flag

Tags should be in lowercase. According to the xhtml specification, html tags must be in lowercase. This makes them compatible with xml.

Fred P 19 years, 2 months ago  # | flag

"Prior Art" - not patentable :-). Something very similar was done some years ago by Andy Dustman in his HyperText package :

http://dustman.net/andy/python/HyperText

I use it instead of HTMLgen to create web pages in cgi scripts, and it works very well. Too bad it's not better known ! Its object-oriented approach (nesting calls like you nest tags in HTML) makes it simpler and more natural to use than Pierre's solution above, IMHO.

Pierre Quentel (author) 19 years, 2 months ago  # | flag

HyperText. The idea is so simple that I was surprised no-one had done this before, I googled for "generate HTML in Python" which returned HTMLGen and templating systems, but not HyperText. It is indeed almost the same, with only a slight difference in the syntax (it uses TAG(*args,attrs) instead of TAG(arg1+arg2+...,attrs) ) ; besides, it's a complete package, also supporting SGML, XHTML etc. Thanks for mentioning it, it deserves to be better known

By the way, nesting tags - TABLE(TR(TD('foo')+TD('bar'))) - is also supported by HTMLTags

vincent delft 19 years, 1 month ago  # | flag

Pyhtmloo. Some times ago I've just written a similar python module called pyhtmloo. It does more or less the same as yours. I've even write a parser, that build pyhtmloo objects from an html page. This way the circle is closed ;-).

You can check it at: http://pyhtmloo.sourceforge.net

Georgy Pruss 18 years, 8 months ago  # | flag

Did it myself too. Just yesterday I wrote nearly the same module - it's easier to write it than to find it :) I also added ability to add and access items and attributes with shift and indexing, and __str__() for converting to text, e.g.:

html = HTML() # there is also general TAG('type',...)
html &lt;&lt; HEAD( TITLE( 'The Title' ) ) # same as HEAD() &lt;&lt; TITLE(...)
# html[0] will be HEAD, and html[0][0] - TITLE, html[0][0][0] - 'The Title'
# indexing with numbers sets/gets items, with strings -- attributes
body = BODY( bgcolor="#E0E0E0" ) &lt;&lt; P( 'Para 1' ) &lt;&lt; P( 'Para 2' )
body['text'] = 'black' # can get/set attribute any time
body['link'] = 'blue'
html &lt;&lt; body # can add itme any time
# etc ...
print html
&lt;html>
&lt;head>
&lt;title>The Title&lt;/title>
&lt;/head>
&lt;body bgcolor="#E0E0E0" text="black" link="blue">
&lt;p>
Para 1&lt;/p>
&lt;p>
Para 2&lt;/p>
&lt;/body>
&lt;/html>
Andy Dustman 18 years, 4 months ago  # | flag

HyperText's roots. HyperText does have it's root's in HTMLGen. It's not based on HTMLGen, but it does borrow a few ideas. Another more modern package which is similar, but has much more support for XML in general is XIST. http://www.livinglogic.de/Python/xist/

Pierre Quentel (author) 16 years, 7 months ago  # | flag

Bug fix. Version 1.6 fixes a bug in the __mul__ method

Pierre Quentel (author) 14 years, 10 months ago  # | flag

Revision 8 : add the __radd__ method, allowing for a syntax like

"Hello, "+B("world !")
Al 14 years, 10 months ago  # | flag

I tried your recipe and I have the following query with respect to the code below. Naively that would create the same page twice... but the body gets added to the head variable when called a second time.

<HTML>
<HEAD>Head</HEAD>
<BODY>Hi</BODY>
<BODY>Hi</BODY></HTML>

Is this a feature or I'm missing something? I can see that this would happen from the add operator because it adds to self. But I'm no expert I have only programming for a few hours.

Perhaps I'm not using the tags correctly.

Cheers, Al

import htmltag
from htmltag import *
import os
import sys

def Write_Html(actual_head,actual_body):

  actual_html = HTML(actual_head+actual_body)

  print '**************'
  print actual_html
  print '**************'

head = HEAD('Head')
body = BODY('Hi')
Write_Html(head,body)
Write_Html(head,body)
Pierre Quentel (author) 14 years, 6 months ago  # | flag

The revision posted today fixes the bug mentioned by Al

It also introduces another, more readable syntax to nest tags : instead of using parenthesis, which can become difficult to read, a tag can be nested in another one using the <= operator, like in

form = FORM(action="foo",method="post")
form <= INPUT(name="bar")
form <= INPUT(Type="submit",value="Ok")

Here the operator can be read as a method "add child" to the left operand. It has no logical relation with "less or equal", but graphically it has the shape of a left arrow, which makes it natural for the "add child" method

Created by Pierre Quentel on Sat, 5 Feb 2005 (PSF)
Python recipes (4591)
Pierre Quentel's recipes (10)

Required Modules

Other Information and Tasks