ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: libxml and (X)HTML documents
by Mark Fowler other posts by this author
Jul 11 2002 9:25AM messages near this date
view in the new Beta List Site
Re: libxml and (X)HTML documents | Adding Namespaces in a Filter
On Thu, 11 Jul 2002, Christian Glahn wrote:

>  so basicly libxml2 uses the same parser for XML and HTML data, where 
>  of the XML parser. 

I'm currently working on the XML::LibXML plugin for the Template Toolkit
atm.  It has two interfaces.  The first, more complicated interface,
allows you to pass named parameters for the type of data source you want
parsed.  The second tries to guess what you meant when you passed in a
single scalar.

Here's the current code for that guessing:

sub _guess_type
{
    # look for a filehandle
    return "fh" if _openhandle($_[0]);

    # okay, look for the xml declaration at the start
    return "string" if $_[0] =~ m/^\<\?xml/;

    # okay, look for the html declaration anywhere in the doc
    return "html_string" if $_[0] =~ m/<html> /i;

    # okay, does this contain a "<" symbol, and declare it to be
    # xml if it's got one, though they should use "<?xml"
    return "string" if $_[0] =~ m{\<};

    # okay, we've tried everything else, return a filename
    return "file";
}

That'll be turned into a call to $libxml-> parse_$returnvalue($data)
later on.

My question is then, is the separate html detection stage needed, or if I
throw it all at parse_html_string?  It all seems to work atm, but I was
wondering if I'm jumping though the wrong hoops.

Mark.

-- 
s''  Mark Fowler                                     London.pm   Bath.pm
     http://www.twoshortplanks.com/              mark@[...].com
';use Term'Cap;$t=Tgetent Term'Cap{};print$t-> Tputs(cl);for$w(split/  +/
){for(0..30){$|=print$t-> Tgoto(cm,$_,$y)." $w";select$k,$k,$k,.03}$y+=2}
Thread:
Aaron Straup Cope
Christian Glahn
Mark Fowler

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved