Re: Keeping numeric entities intact when parsing and serializing?
by Nicolas Mendoza other posts by this author
Oct 10 2009 3:17AM messages near this date
view in the new Beta List Site
Re: Keeping numeric entities intact when parsing and serializing?
|
Re: Keeping numeric entities intact when parsing and serializing?
& XSLT On Sat, 10 Oct 2009 01:22:30 +0200, Petr Pajas <pajas@[...].cz>
wrote:
> 2009/10/10 Nicolas Mendoza <mendoza@[...].no>:
> > On Fri, 09 Oct 2009 23:58:53 +0200, Petr Pajas <pajas@[...].cz>
> > wrote:
> >
> >> 2009/10/9 Nicolas Mendoza <mendoza@[...].no>:
> >>>
> >>> On Fri, 09 Oct 2009 22:51:32 +0200, Aristotle Pagaltzis
> >>> <pagaltzis@[...].de>
> >>> wrote:
> >>>
> >>>> * Nicolas Mendoza <mendoza@[...].no> [2009-10-09 17:55]:
> >>>>>
> >>>>> Is there some way to keep the entities intact when parsing and
> >>>>> serlalizing numeric entities?
> >>>>
> >>>> Why would you want such a thing?
> >>>>
> >>>
> >>> Because I'm feeding it data and I want it to come out the same way?
> >>> Just
> >>> like & does. (I want to distinguish an incoming ' and "'". So
> >>> I
> >>> don't want it to alter my valid XML data, basically.)
> >>>
> >>> Actually it's a bit surprising that libxml2 does that.
> >>
> >> Huh? There is absolutly nothing surprising about that, it is an XML
> >> parser? Programming XML would be hell if XML parsers didn't do this.
> >>>
> >>> From the XML point of view, ' and ' are the same thing! Read the
> >>
> >> XML spec.
> >>
> >> To be fair, there are few cases when particular formatting matters,
> >> but those are (supposed to be) dealt with by XML C14N (and possibly
> >> XML Encryption and XML Signature).
> >>
> >
> > I think I can sympathize with your sentiments, but I'm not sure I can
> > agree
> > 100% that altering in-data is the optimal way of functioning.
>
> The data is the charcters, not the way they are serialized in XML.
> What you want is of the same nature as asking the parser to remember
> the original whitespace within XML tags, e.g. to distinguish
>
> <foo bar="baz"/>
>
> from
>
> <foo
> bar="baz"
> />
>
> No widely-adopted XML API can preserve this distinction, nor it can
> preserve the distinction between a character and the corresponding
> numerical entity. XML APIs typically exchange content, not
> representation.
>
> > No matter if they are the same in theory.
>
> What theory that would be? I'm not talking about theories, I'm talking
> about the XML 1.0 spec.
>
> > Why would it be hell if XML parsers didn't convert numeric entities to
> > ASCII, UTF-8 (or whatever charset is possible/available at the time) on
> > serialization?
>
> It would be hell because it would be like having no parser at all.
>
> Also note that parsers don't "convert" anything on serialization, but
> already during parse. In fact, parsers don't serialize (serializers
> do); they parse, i.e. read the input and decode the content into some
> structured form, passing it via some API to a handler (possibly a
> serializer or an application). Content, not the way it is encoded
> (alghough, to be fair, some do parsers send the original as well, e.g.
> via offsets to the source stream; from the implementation point of
> view this may be costly, from the practical point of view it is seldom
> useful).
>
> So to sum up: if you don't want a parser to parse your input, then
> don't use one! Just process the XML as text (e.g. using regexps,
> lexer, tokenizer, or a parser that gives you real low-level access),
> since apparently you are not interested in the content of the XML
> document but one particular textual representation of the content in
> XML.
>
> -- Petr
So, given that we accept this, why is & not converted to "'"?
--
Nicolas Mendoza
http://my.opera.com/nicomen
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Nicolas Mendoza
Mirod
Jenda Krynicky
Nicolas Mendoza
Aristotle Pagaltzis
Nicolas Mendoza
Aristotle Pagaltzis
Petr Pajas
Nicolas Mendoza
Petr Pajas
Nicolas Mendoza
Aristotle Pagaltzis
Nicolas Mendoza
Nicolas Mendoza
Petr Pajas
|