| Store | Cart

libxml and (X)HTML documents

From: Aaron Straup Cope <a...@vineyard.net>
Wed, 10 Jul 2002 23:50:47 -0400 (EDT)
Hi all,

Can someone help me understand how exactly libxml deals with HTML file
and, more specifically, XHTML files?

I can understand treating HTML files as "special" but it appears that
XHTML files are lumped in with the bad apples even though there isn't any
reason for them to be.

If it's just another thing on the 'to-do' list then I can deal. But, I've
had to jump through all kinds of hoops (see below) to get all the widgets
used by, and including, XML::Filter::XSLT to munge one XHTML document into
another in a SAX context.

It's done so I'm happy enough but it seems completely nuts to have to go
these lengths.

Thanks,

# in package Aaron::XML::Filter::XSLT

sub end_document {
    my $self = shift;

    # because "IMA" XML::Filter::XSLT so calling
    # SUPER would make bad things happen

    my $dom = $self->XML::LibXML::SAX::Builder::end_document(@_);

    # Gah! In a plain old XML::LibXSLT situation I can
    # call parse_html_file, but since ::SAX::Builder calls
    # $obj->createDocument() there doesn't seem to be anything
    # else but to do the following...

    my $parser = XML::LibXML->new();
    $dom = $parser->parse_html_string($dom->toString());

    my $xslt       = XML::LibXSLT->new();
    my $stylesheet = $xslt->parse_stylesheet($self->{StylesheetDOM});

    my $results = $stylesheet->transform($dom,((ref($self->{'__params'})
eq "ARRAY") ? @{$self->{'__params'}} : ()));

    # see earlier note to list on same subject [1]
    # this subclass basically does the following :
    # "You say HTML_DOCUMENT, I say X(HT)ML_DOCUMENT"

    my $parser = Aaron::XML::LibXML::SAX::Parser->new(%$self);
    $parser->generate($results);
}

[1] http://aspn.activestate.com/ASPN/Mail/Message/1189521

Recent Messages in this Thread
Aaron Straup Cope Jul 11, 2002 03:50 am
Christian Glahn Jul 11, 2002 08:48 am
Messages in this thread