ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
RE: A whitespace issue in XML::LibXML
by Mark - BLS CTR Thomas other posts by this author
Jul 20 2007 9:23AM messages near this date
view in the new Beta List Site
Re: A whitespace issue in XML::LibXML | perl-ish way to deal with footnotes in an XML document
& XSLT Can you explain what you want a little more clearly? "Needs to be
isolated" isn't much to go on. Example desired output for the given
input would be nice.

- Mark.

>  -----Original Message-----
>  From: perl-xml-bounces@[...].com [mailto:perl-xml-
>  bounces@[...].com] On Behalf Of Birgit Kellner
>  Sent: Friday, July 20, 2007 11:35 AM
>  To: perl-xml@[...].com
>  Subject: A whitespace issue in XML::LibXML
>  
>  Here's a follow-up to my question from yesterday about parsing
>  footnotes
>  with XML::LibXML.
>  
>  This time, I'm parsing verse lines, so it's a different situation, but
>  the replies to yesterday's query helped me to build a string out of
>  the
>  text nodes in a mixed-content node that is part of a larger structure.
>  
>  I'm recursively parsing from a higher level all the way down, always
>  passing a node to a subroutine. If it is a text node, the subroutine
>  appends its data content to a scalar and ends; if the node has
>  children,
>  the subroutine is called again.
>  
>  This is the code (minus a few attributes):
>  
>  <lg><note><span><l>
>  <seg n="a">......</seg>
>  <seg n="b">......</seg></l><l>
>  <seg n="c">......</seg>
>  <seg n="d">......</seg>
>  </l></span><app>
>  ...
>  ...
>  </app></note></lg>
>  
>  The text nodes are inside the <seg>-elements, which may also contain
>  further <note>-elements, and whatnot.
>  
>  The <l>-elements are verse lines, and their data content needs to be
>  isolated.
>  This is done in two different ways: for the <l>-element containing
>  <seg>s "a" and "b", the trigger is the beginning of <seg> "c". The
>  scalar content is then copied and emptied out.
>  
>  For the <l>-element containing <seg>s "c" and "d", the routine checks
>  if
>  the last <seg> parsed was "d", if  the node in question is a text node
>  and has no next sibling. This means it's the last text node in the
>  last
>  segment of the verse, and thus the second verse line is complete.
>  
>  This last check runs into problems, however, when there is additional
>  whitespace before the closing </lg>-tag:
>  
>  <lg><note><span><l>
>  <seg n="a">......</seg>
>  <seg n="b">......</seg></l><l>
>  <seg n="c">......</seg>
>  <seg n="d">......</seg>
>  </l></span><app>
>  ...
>  ...
>  </app></note>
>  </lg>
>  
>  The newline after the closing </note> tag results in a text node with
>  whitespace as its content. When the script arrives at that node, it
>  logically determines that this is the final text node, has no
>  siblings,
>  and that the last <seg> parsed was "d".
>  
>  I can get around this problem by testing, in addition to checking on
>  the
>  "n"-attribute of the last parsed <seg> and the absence of a next
>  sibling, whether the text node is actually inside <seg>
>  ($el->findnodes('ancestor::seg') - need not necessarily be a parent).
>  
>  But still, I'm intrigued that this actually happens, and was wondering
>  whether there is any way to make XML::LibXML ignore this whitespace.
>  
>  
>  
>  Birgit
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  _______________________________________________
>  Perl-XML mailing list
>  Perl-XML@[...].com
>  To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Birgit Kellner
Petr Pajas
Birgit Kellner
A. Pagaltzis
Vaclav Barta
Birgit Kellner
Richard E. Rathmann
Vaclav Barta
Petr Pajas
Vaclav Barta
Mark - BLS CTR Thomas

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved