This recipe uses DOM (precisely, cDomlette or the minidom variant in 4Suite) to merge two files containing XBEL boomark listings. It uses Python 2.2. generators for straightforward and efficient iteration over the XBEL DOM trees in document order. It requires Python 2.2 and 4Suite 0.12.0a2 or more recent versions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | #!/usr/bin/env python
from __future__ import generators
from xml.dom import Node
from Ft.Xml.Domlette import NonvalidatingReader, PrettyPrint
def in_order_iterator_filter(node, filter_func):
if filter_func(node):
yield node
for child in node.childNodes:
for cn in in_order_iterator_filter(child, filter_func):
if filter_func(cn):
yield cn
return
def get_elements_by_tag_name_ns(node, ns, local):
return in_order_iterator_filter(
node,
lambda n: n.nodeType == Node.ELEMENT_NODE and \
n.namespaceURI == ns and n.localName == local
)
def string_value(node):
text_nodes = in_order_iterator_filter(
node, lambda n: n.nodeType == Node.TEXT_NODE)
return u''.join([ n.data for n in text_nodes ])
def get_title(node):
return string_value(
get_elements_by_tag_name_ns(node, None, 'title').next())
def merge_folders(folder_node1, folder_node2):
#Folder element children of folder1
folder1_folders = \
[ n for n in folder_node1.childNodes if n.nodeName == 'folder' ]
#Yes, the list must be copied to avoid mutate-while-iterate bugs
for elem in folder_node2.childNodes[:]:
#No need to copy title element
if elem.nodeName == 'title':
continue
#
elif elem.nodeName == 'folder':
title = get_title(elem)
for a_folder in folder1_folders:
if title == get_title(a_folder):
merge_folders(a_folder, elem)
break
else:
folder_node1.appendChild(elem)
else:
folder_node1.appendChild(elem)
def xbel_merge(xbel1, xbel2):
xbel1_top_level = \
[ n for n in xbel1.documentElement.childNodes \
if n.nodeType == Node.ELEMENT_NODE ]
xbel1_top_level_folders = \
[ n for n in xbel1_top_level if n.nodeName == 'folder' ]
xbel1_top_level_bookmarks = \
[ n for n in xbel1_top_level if n.nodeName == 'bookmark' ]
xbel2_top_level = \
[ n for n in xbel2.documentElement.childNodes \
if n.nodeType == Node.ELEMENT_NODE ]
for elem in xbel2_top_level:
if elem.nodeName == 'folder':
title = get_title(elem)
for a_folder in xbel1_top_level_folders:
if title == get_title(a_folder):
merge_folders(a_folder, elem)
break
else:
xbel1.documentElement.appendChild(elem)
elif elem.nodeName == 'bookmark':
xbel1.documentElement.appendChild(elem)
return xbel1
if __name__ == "__main__":
import sys
xbel1 = NonvalidatingReader.parseUri(sys.argv[1])
xbel2 = NonvalidatingReader.parseUri(sys.argv[2])
new_xbel = xbel_merge(xbel1, xbel2)
PrettyPrint(new_xbel)
|
This is actually an updated version of an old script I posted ages ago:
http://mail.python.org/pipermail/xml-sig/1999-September/001441.html
This version is much faster and uses current APIs. For more info on XBEL, see:
http://pyxml.sourceforge.net/topics/xbel/
An alternate implementation could use straight Python 2.2 minidom. The main changes would be using "minidom.parse" instead of "NonvalidatingReader.parseUri" and "new_xbel.toxml()" instead of "PrettyPrint(new_xbel)".
For a great introduction to generators, about which I can hardly rave enough, see:
http://www-106.ibm.com/developerworks/library/l-pycon.html http://www-106.ibm.com/developerworks/linux/library/l-pythrd.html
To test this script, you can use the following 2 XBEL files:
Bookmarks of Joris Graaumans [excerpt]
XML
ZVON.org
XML.ORG - The XML Industry Portal
The XML Cover Pages - Home Page
Software
xmlsoftware.com
VBXML
Xpath Visualiser
DOM
JavaScript DOM level 1
JavaScript DOM examples
DTDs
TEI
TEI pizza chef
TEI Consortium
Xbel
Xbel homepage
Misc
XMLephant: Technologies/DTDs_and_Examples
DocBook DTD
DocBook Character Entity Reference
Docbook reference guide
and
Bookmarks of Joris Graaumans [excerpt]
Dictionaries
University of Alberta Cognitive Science Dictionary (Home
Page)
Cognitive science woordenboek
Dictionary.com
Cambridge International Dictionaries
Het Van Dale Taalweb
XML
XML discussion lists
The World Wide Web Consortium
DOM
DOM-Level-2-Core
XSL
XSLT benchmark
xml.apache.org Examples
XSL specs van W3c
XSL working draft 1.0 van het W3c
Extensible Stylesheet
Language (XSL)
Discussion lists
Mulberry Technologies, Inc.: XSL-List -- Open Forum on XSL
Mulberry Technologies, Inc.: XSL
Just call "python xbel_merge.py bm1.xbel bm2.xbel" and the results will go to stdout.
The sample XML files got corrupted. Looks as of the Cookbook uploader can't take XML in the "notes" field. Yes. I tried preview, and it did show the tags, though with indentation removed. Nothing like what showed up in the end. I've put the 2 XBEL files you can use for testing at
http://uche.ogbuji.net/etc/020625/bm1.xbel
and
http://uche.ogbuji.net/etc/020625/bm2.xbel
Sorry for any inconvenience.