| Store | Cart

Re: libxml and (X)HTML documents

From: Mark Fowler <m...@twoshortplanks.com>
Thu, 11 Jul 2002 10:25:24 +0100 (BST)
On Thu, 11 Jul 2002, Christian Glahn wrote:

> so basicly libxml2 uses the same parser for XML and HTML data, where > of the XML parser. 

I'm currently working on the XML::LibXML plugin for the Template Toolkit
atm.  It has two interfaces.  The first, more complicated interface,
allows you to pass named parameters for the type of data source you want
parsed.  The second tries to guess what you meant when you passed in a
single scalar.

Here's the current code for that guessing:

sub _guess_type
{
    # look for a filehandle
    return "fh" if _openhandle($_[0]);

    # okay, look for the xml declaration at the start
    return "string" if $_[0] =~ m/^\<\?xml/;

    # okay, look for the html declaration anywhere in the doc
    return "html_string" if $_[0] =~ m/<html>/i;

    # okay, does this contain a "<" symbol, and declare it to be
    # xml if it's got one, though they should use "<?xml"
    return "string" if $_[0] =~ m{\<};

    # okay, we've tried everything else, return a filename
    return "file";
}

That'll be turned into a call to $libxml->parse_$returnvalue($data)
later on.

My question is then, is the separate html detection stage needed, or if I
throw it all at parse_html_string?  It all seems to work atm, but I was
wondering if I'm jumping though the wrong hoops.

Mark.

-- 
s''  Mark Fowler                                     London.pm   Bath.pm
     http://www.twoshortplanks.com/              m...@twoshortplanks.com
';use Term'Cap;$t=Tgetent Term'Cap{};print$t->Tputs(cl);for$w(split/  +/
){for(0..30){$|=print$t->Tgoto(cm,$_,$y)." $w";select$k,$k,$k,.03}$y+=2}

Recent Messages in this Thread
Mark Fowler Jul 11, 2002 09:25 am
Aaron Straup Cope Jul 11, 2002 12:49 pm
Christian Glahn Jul 11, 2002 08:48 am
Messages in this thread