ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
parse_balanced_chunk with document context
by Nicolas Mendoza other posts by this author
Jul 10 2007 11:52PM messages near this date
view in the new Beta List Site
Re: Can I prevent XML::DOM::Parser from resolving character entities? | [ANN] XML::SAX::ExpatXS 1.30
& XSLT Hi,

I have just started using XML::LibXML 1.63 and have run into a problem  
when parsing chunks of XML that contain HTML entities.
There is no problem when parsing an entire document where I can include  
definitions of various (X)HTML entities that XML don't contain, but once I  
use the parse_balanced_chunk function that makes a document fragment out  
of a balanced chunk of XML, then I can't tell it to use a set of  
definitions from a document it could belong to.

I solved this partly by hacking into the C parts of LibXML allowing for an  
additional optional parameter that could be a context document (as the  
libxml2 function that is used by parse_balanced_chunk allows this).  
However, I have little experience making C modules for perl, and I might  
be doing something wrong, as when I later use the resulting DOM fragment  
with $dom_frag-> parentNode()->replaceChild($dom_frag,$inc_ele) it  
segfaults while reconsiling things. I suspect a) the code is not tested in  
the case of a dom fragment having a context document, or b) the pointer to  
the document might be wrongfully altered or wrong inside the C code.

The diff for my changes is currently: http://utilitybase.com/paste/4826

Example code using this: http://utilitybase.com/paste/4827

However, I was told that it might not be possible to hack in this  
functionality at all so if it isn't possible, does anyone have any hints  
on how I could parse a string of XML containing (X)HTML entities so that I  
can insert it into an XML document containing the right definitions?

(My temporary solution is to convert the (X)HTML entities to number  
entities right before using parse_balanced_chunk, and I _could_ also use  
CDATA wrappers around the code, but I don't like those solutions really.)

Apologies upfront if I failed to meet any criteria required when posting  
here. Feel free to enlighten me on how to do so correctly if that is the  
case.

-- 
Thanks,
Nicolas Mendoza
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved