Welcome, guest | Sign In | My Account | Store | Cart

Remove whitespace-only text nodes from an XML DOM (Python recipe) by Brian Quinlan
ActiveState Code (http://code.activestate.com/recipes/303061/)

XML parsers consider several conditions when deciding which whitespace-only text nodes should be preserved during DOM construction. Unfortunately, those conditions are controlled by the document's DTD or by the content of document itself. Since it is often difficult to modify the DTD or the XML, this recipe simple removes all whitespace-only text nodes from a DOM node.

      def remove_whilespace_nodes(node, unlink=False):
    """Removes all of the whitespace-only text decendants of a DOM node.
    
    When creating a DOM from an XML source, XML parsers are required to
    consider several conditions when deciding whether to include
    whitespace-only text nodes. This function ignores all of those
    conditions and removes all whitespace-only text decendants of the
    specified node. If the unlink flag is specified, the removed text
    nodes are unlinked so that their storage can be reclaimed. If the
    specified node is a whitespace-only text node then it is left
    unmodified."""
    
    remove_list = []
    for child in node.childNodes:
        if child.nodeType == dom.Node.TEXT_NODE and \
           not child.data.strip():
            remove_list.append(child)
        elif child.hasChildNodes():
            remove_whilespace_nodes(child, unlink)
    for node in remove_list:
        node.parentNode.removeChild(node)
        if unlink:
            node.unlink()

      

This code should work with any correctly implemented Python-DOM.

Tags: xml

2 comments

David Wilson 19 years, 8 months ago # | flag

Error?

Should:

        elif child.hasChildNodes():
            remove_whilespace_nodes(child)

Read:

        elif child.hasChildNodes():
            remove_whilespace_nodes(child, unlink)

?


David.

Brian Quinlan (author) 19 years, 7 months ago # | flag

Right you are. Yes, I've updated the recipe.

Created by Brian Quinlan on Thu, 2 Sep 2004 (PSF)

◄	Python recipes (4591)	►
◄	Brian Quinlan's recipes (7)	►
◄	Python Cookbook Edition 2 (117)	►

Required Modules

(none specified)

Other Information and Tasks

Licensed under the PSF License
Viewed 19927 times
Revision 2 (updated 19 years ago)

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Remove whitespace-only text nodes from an XML DOM (Python recipe) by Brian Quinlan ActiveState Code (http://code.activestate.com/recipes/303061/)