zc.htmlchecker | Python Package Manager Index (PyPM)

INSTALL>

pypm install zc.htmlchecker

[+]

How to install zc.htmlchecker

Download and install ActivePython
Open Command Prompt
Type pypm install zc.htmlchecker

Python 2.7

Python 3.2

Python 3.3

Windows (32-bit)

Windows (64-bit)

Mac OS X (10.5+)

Linux (32-bit)

Linux (64-bit)

The build is available for this platform; click to see other versions

0.1.0

Available

View build log

Author

Jim Fulton

License

ZPL 2.1

Dependencies

Imports

zc.htmlchecker

Lastest release

version 0.1.0 on Jan 9th, 2014

HTML/DOM Checker

When testing code (like widgets) that generates DOM nodes, we want to be able to make assertions about what matters. Examples of things we'd like to ignore:

attribute order
extra attributes
attribute order
extra classes
extra nodes

zc.htmlchecker provides a checker object that can be used by itself, or as a doctest output checker.

Contents

HTML/DOM Checker
Changes
- 0.1.0 2013-08-31

Getting started

Let's look at some examples.

Here's a sample expected string:

<body>
  <button class="mybutton">press me</button>
</body>

Let's create a checker:

>>> import zc.htmlchecker
>>> checker = zc.htmlchecker.HTMLChecker()

You can call its check method with expected and observed HTML:

>>> checker.check(
... expected,
... """<html><body><button x='1' class="widget mybutton">press me</button>
...          </body></html>""")

If there's a match, then nothing is returned. For there to be a match, the expected output merely has to be unambiguously found in the observed output. In the above example, there was a single body tag, so it knew how to do the match. Note that whitespace differences were ignored, as were extra observed attributes and an extra class.

doctest Checker

To use zc.htmlchecker as a doctest checker, pass an instance of HTMLChecker as an output checker when setting up your doctests.

When used as a doctest checker, expected text that doesn't start with < is checked with the default checker, or a checker you pass in as base.

You may want to have some html examples checked with another checker. In that case, you can specify a prefix. Only examples that begin with the prefix will be checked with the HTML checker, and the prefix will be removed.

Expecting multiple nodes

We can expect more than a single node:

<button>Cancel</button>
<button>Save</button>

This example expects 2 button nodes somewhere in the output.

>>> checker.check(expected,
... """<html><body>
...         <button id='cancel_button' class="button">Cancel</button>
...         <button id='save_button' class="button">Save</button>
...    </body></html>""")

But if there isn't a match, it can be harder to figure out what's wrong:

>>> checker.check(expected,
... """<html><body>
...         <button id='cancel_button' class="button">Cancel</button>
...         <button id='save_button' class="button">OK</button>
...    </body></html>""")
Traceback (most recent call last):
...
MatchError: Couldn't find wildcard match
Expected:
<button>
 Save
</button>
<BLANKLINE>
Observed:
<html>
 <body>
  <button class="button" id="cancel_button">
   Cancel
  </button>
  <button class="button" id="save_button">
   OK
  </button>
 </body>
</html>

We'll come back to wild card matches in a bit. Here, the matcher detected that it didn't match a button, but couldn't be specific about which button was the problem. We can make its job easier using ids:

<button id='cancel_button'>Cancel</button>
<button id='save_button'>Save</button>

Now we're looking for button nodes with specific ids.

>>> checker.check(expected,
... """<html><body>
...         <button id='cancel_button' class="button">Cancel</button>
...         <button id='save_button' class="button">OK</button>
...    </body></html>""")
Traceback (most recent call last):
...
MatchError: text nodes differ u'Save' != u'OK'
Expected:
<button id="save_button">
 Save
</button>
<BLANKLINE>
Observed:
<button class="button" id="save_button">
 OK
</button>
<BLANKLINE>

That's a lot more helpful.

Wildcards

Speaking of wild card matches, sometimes you want to ignore intermediate nodes. You can do this by using an ellipsis at the top of a node that has intermediate nodes you want to ignore:

<form>
  ...
  <button id='cancel_button'>Cancel</button>
  <button id='save_button'>Save</button>
</form>

In this case, we want to find button nodes inside a form node. We don't care if there are intermediate nodes.

>>> checker.check(expected,
... """<html><body>
...    <form>
...      <div>
...         <button id='cancel_button' class="button">Cancel</button>
...         <button id='save_button' class="button">Save</button>
...      </div>
...    </form>
...    </body></html>""")

When looking for expected text, we basically do a wild-card match on the observed text.

Sometimes, we want to check for text nodes that may be embedded in some generated construct that we can't control (like a grid produced by a library). To do that, include a text node that starts with a line containing an ellipsis. For example, we may expect a grid/table with some data:

<div id="mygrid" name="">
...
Name    Favorite Color
Sally   Red
Bill    Blue
</div>

We don't know exactly how our library is going to wrap the data, so we just test for the presense of the data.

>>> import sys
>>> try: checker.check(expected,
... """<html><body>
...      <div id='mygrid' name='' xid="1">
...        <table>
...          <tr><th>Name</th><th>Favorite Color</th></tr>
...          <tr><td>Sally</td><td>Red  </td></tr>
...          <tr><td>Bill </td><td>Green</td></tr>
...        </table>
...      </div>
...    </body></html>""")
... except zc.htmlchecker.MatchError:
...    error = sys.exc_info()[1]
... else: print 'oops'
>>> print error # doctest: +ELLIPSIS
Blue not found in text content.
...

>>> checker.check(expected,
... """<html><body>
...      <div id='mygrid' name='' xid="1">
...        <table>
...          <tr><th>Name</th><th>Favorite Color</th></tr>
...          <tr><td>Sally</td><td>Red  </td></tr>
...          <tr><td>Bill </td><td>Blue</td></tr>
...        </table>
...      </div>
...    </body></html>""")

You can use other BeautifulSoup parsers

HTMLChecker uses BeautifulSoup. It uses the 'html5lib' parser by default, but you can pass a different parser name. You probably want to stere clear of the 'html.parser' parser, as it's buggy:

>>> checker = zc.htmlchecker.HTMLChecker(parser='html.parser')
>>> checker.check('<input id="x">', '<input id="x"><input>')
Traceback (most recent call last):
...
MatchError: Wrong number of children 1!=0
Expected:
<input id="x"/>
<BLANKLINE>
Observed:
<input id="x">
 <input/>
</input>

Here, 'html.parser' decided that the input tags needed closing tags, even though the HTML input tag is empty. This is likely in part because the underlying parser is an XHTML parser.

Changes

0.1.0 2013-08-31

Initial release.

PyPM Index

zc.htmlchecker 0.1.0

HTML/DOM Checker

How to install zc.htmlchecker

Links

Author

License

Dependencies

Imports

Lastest release

HTML/DOM Checker

Getting started

doctest Checker

Expecting multiple nodes

Wildcards

You can use other BeautifulSoup parsers

Changes

0.1.0 2013-08-31

Subscribe to package updates

What does the lock icon mean?

Need custom builds or support?

Plan on re-distributing ActivePython?

Accounts

PyPM

Feedback & Information

ActiveState

PyPM Index

zc.htmlchecker 0.1.0 HTML/DOM Checker

How to install zc.htmlchecker

Links

Author

License

Dependencies

Imports

Lastest release

Subscribe to package updates

What does the lock icon mean?

Need custom builds or support?

Plan on re-distributing ActivePython?

Accounts

PyPM

Feedback & Information

ActiveState

zc.htmlchecker 0.1.0

HTML/DOM Checker