ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: Keeping numeric entities intact when parsing and serializing?
by Nicolas Mendoza other posts by this author
Oct 10 2009 3:17AM messages near this date
view in the new Beta List Site
Re: Keeping numeric entities intact when parsing and serializing? | Re: Keeping numeric entities intact when parsing and serializing?
& XSLT On Sat, 10 Oct 2009 01:22:30 +0200, Petr Pajas <pajas@[...].cz>   
wrote:

>  2009/10/10 Nicolas Mendoza <mendoza@[...].no>:
> > On Fri, 09 Oct 2009 23:58:53 +0200, Petr Pajas <pajas@[...].cz>
> > wrote:
> >
> >> 2009/10/9 Nicolas Mendoza <mendoza@[...].no>:
> >>>
> >>> On Fri, 09 Oct 2009 22:51:32 +0200, Aristotle Pagaltzis
> >>> <pagaltzis@[...].de>
> >>> wrote:
> >>>
> >>>> * Nicolas Mendoza <mendoza@[...].no> [2009-10-09 17:55]:
> >>>>>
> >>>>> Is there some way to keep the entities intact when parsing and
> >>>>> serlalizing numeric entities?
> >>>>
> >>>> Why would you want such a thing?
> >>>>
> >>>
> >>> Because I'm feeding it data and I want it to come out the same way?  
> >>> Just
> >>> like &amp; does. (I want to distinguish an incoming &#39; and "'". So  
> >>> I
> >>> don't want it to alter my valid XML data, basically.)
> >>>
> >>> Actually it's a bit surprising that libxml2 does that.
> >>
> >> Huh? There is absolutly nothing surprising about that, it is an XML
> >> parser? Programming XML would be hell if XML parsers didn't do this.
> >>>
> >>> From the XML point of view, &#39; and ' are the same thing! Read the
> >>
> >> XML spec.
> >>
> >> To be fair, there are few cases when particular formatting matters,
> >> but those are (supposed to be) dealt with by XML C14N (and possibly
> >> XML Encryption and XML Signature).
> >>
> >
> > I think I can sympathize with your sentiments, but I'm not sure I can  
> > agree
> > 100% that altering in-data is the optimal way of functioning.
> 
>  The data is the charcters, not the way they are serialized in XML.
>  What you want is of the same nature as asking the parser to remember
>  the original whitespace within XML tags, e.g. to distinguish
> 
>  <foo bar="baz"/>
> 
>  from
> 
>  <foo
>         bar="baz"
>      />
> 
>  No widely-adopted XML API can preserve this distinction, nor it can
>  preserve the distinction between a character and the corresponding
>  numerical entity. XML APIs typically exchange content, not
>  representation.
> 
> > No matter if they are the same in theory.
> 
>  What theory that would be? I'm not talking about theories, I'm talking
>  about the XML 1.0 spec.
> 
> > Why would it be hell if XML parsers didn't convert numeric entities to
> > ASCII, UTF-8 (or whatever charset is possible/available at the time) on
> > serialization?
> 
>  It would be hell because it would be like having no parser at all.
> 
>  Also note that parsers don't "convert" anything on serialization, but
>  already during parse. In fact, parsers don't serialize (serializers
>  do); they parse, i.e. read the input and decode the content into some
>  structured form, passing it via some API to a handler (possibly a
>  serializer or an application). Content, not the way it is encoded
>  (alghough, to be fair, some do parsers send the original as well, e.g.
>  via offsets to the source stream; from the implementation point of
>  view this may be costly, from the practical point of view it is seldom
>  useful).
> 
>  So to sum up: if you don't want a parser to parse your input, then
>  don't use one! Just process the XML as text (e.g. using regexps,
>  lexer, tokenizer, or a parser that gives you real low-level access),
>  since apparently you are not interested in the content of the XML
>  document but one particular textual representation of the content in
>  XML.
> 
>  -- Petr

So, given that we accept this, why is &amp; not converted to "'"?

-- 
Nicolas Mendoza
http://my.opera.com/nicomen
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Nicolas Mendoza
Mirod
Jenda Krynicky
Nicolas Mendoza
Aristotle Pagaltzis
Nicolas Mendoza
Aristotle Pagaltzis
Petr Pajas
Nicolas Mendoza
Petr Pajas
Nicolas Mendoza
Aristotle Pagaltzis
Nicolas Mendoza
Nicolas Mendoza
Petr Pajas

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved