Re: libxml and (X)HTML documents

From: Mark Fowler <m...@twoshortplanks.com>

Thu, 11 Jul 2002 10:25:24 +0100 (BST)

On Thu, 11 Jul 2002, Christian Glahn wrote:

> so basicly libxml2 uses the same parser for XML and HTML data, where > of the XML parser. 

I'm currently working on the XML::LibXML plugin for the Template Toolkit
atm.  It has two interfaces.  The first, more complicated interface,
allows you to pass named parameters for the type of data source you want
parsed.  The second tries to guess what you meant when you passed in a
single scalar.

Here's the current code for that guessing:

sub _guess_type
{
    # look for a filehandle
    return "fh" if _openhandle($_[0]);

    # okay, look for the xml declaration at the start
    return "string" if $_[0] =~ m/^\<\?xml/;

    # okay, look for the html declaration anywhere in the doc
    return "html_string" if $_[0] =~ m/<html>/i;

    # okay, does this contain a "<" symbol, and declare it to be
    # xml if it's got one, though they should use "<?xml"
    return "string" if $_[0] =~ m{\<};

    # okay, we've tried everything else, return a filename
    return "file";
}

That'll be turned into a call to $libxml->parse_$returnvalue($data)
later on.

My question is then, is the separate html detection stage needed, or if I
throw it all at parse_html_string?  It all seems to work atm, but I was
wondering if I'm jumping though the wrong hoops.

Mark.

-- 
s''  Mark Fowler                                     London.pm   Bath.pm
     http://www.twoshortplanks.com/              m...@twoshortplanks.com
';use Term'Cap;$t=Tgetent Term'Cap{};print$t->Tputs(cl);for$w(split/  +/
){for(0..30){$|=print$t->Tgoto(cm,$_,$y)." $w";select$k,$k,$k,.03}$y+=2}

Recent Messages in this Thread
Re: libxml and (X)HTML documents	Mark Fowler	Jul 11, 2002 09:25 am
Re: libxml and (X)HTML documents	Aaron Straup Cope	Jul 11, 2002 12:49 pm
Re: libxml and (X)HTML documents	Christian Glahn	Jul 11, 2002 08:48 am

◄

Messages in this thread ►

Previous post: Re: libxml and (X)HTML documents

Next post: Re: libxml and (X)HTML documents

Subscribe to the perl-xml RSS feed

Accounts

List Archives

Feedback & Information

ActiveState

© 2019 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.