Hi all,
Can someone help me understand how exactly libxml deals with HTML file
and, more specifically, XHTML files?
I can understand treating HTML files as "special" but it appears that
XHTML files are lumped in with the bad apples even though there isn't any
reason for them to be.
If it's just another thing on the 'to-do' list then I can deal. But, I've
had to jump through all kinds of hoops (see below) to get all the widgets
used by, and including, XML::Filter::XSLT to munge one XHTML document into
another in a SAX context.
It's done so I'm happy enough but it seems completely nuts to have to go
these lengths.
Thanks,
# in package Aaron::XML::Filter::XSLT
sub end_document {
my $self = shift;
# because "IMA" XML::Filter::XSLT so calling
# SUPER would make bad things happen
my $dom = $self->XML::LibXML::SAX::Builder::end_document(@_);
# Gah! In a plain old XML::LibXSLT situation I can
# call parse_html_file, but since ::SAX::Builder calls
# $obj->createDocument() there doesn't seem to be anything
# else but to do the following...
my $parser = XML::LibXML->new();
$dom = $parser->parse_html_string($dom->toString());
my $xslt = XML::LibXSLT->new();
my $stylesheet = $xslt->parse_stylesheet($self->{StylesheetDOM});
my $results = $stylesheet->transform($dom,((ref($self->{'__params'})
eq "ARRAY") ? @{$self->{'__params'}} : ()));
# see earlier note to list on same subject [1]
# this subclass basically does the following :
# "You say HTML_DOCUMENT, I say X(HT)ML_DOCUMENT"
my $parser = Aaron::XML::LibXML::SAX::Parser->new(%$self);
$parser->generate($results);
}
[1] http://aspn.activestate.com/ASPN/Mail/Message/1189521