ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> xml-dev
xml-dev
Re: [xml-dev] Quick Review of XML 1.1 Candidate Recommendation
by Rick Jelliffe other posts by this author
Oct 17 2002 11:07AM messages near this date
[xml-dev] Yucks from IBM | [xml-dev] Typo Re: [xml-dev] Quick Review of XML 1.1 Candidate Recommendation
From: "Tim Bray" <tbray@[...].com> 

>  My problem is that XML has de facto been a significant step forward for 
>  interoperability between heterogeneous systems, and this seems like a 
>  step backward.  At the moment, we can say confidently that XML markup 
>  exposes logical structure unambiguously, and the content is text, which 
>  means a sequence of unicode characters, and the characters have the 
>  semantics that Unicode says they have.  This is fine for characters such 
>  as 'a' or &#x222b; (the integral sign), but the range &#x0; - &#x1f; is 
>  another kettle of fish.  By my reading, none of the characters in the 
>  ranges 0-#x7, #xb, #xe-#x1a have any agreed-upon semantics de jure or de 
>  facto (let's go down to the mall and do some &#x16;). 
 
Starting with about Unicode 3.0, the U+0080-U+009F characters are 
now occupied by the ISO C1  controls, unless specifically overridden;
XML 1.0 and XML 1.1 does not specifically override.

See http://www.unicode.org/unicode/uni2book/ch13.pdf  s.13.1

XML 1.1 is intended to cope with Unicode 3.n, and the new fixing of the 
C1 controls is one of those things.  So the backwards compatability issue
is really one that springs from Unicode, not from XML IMHO.  It was 
pretty sus (or a convenient hack) to use the C1 code points before.

Tim's point about needing to follow the Unicode semantics is well-made and
important, but I think the XML 1.1 draft *does* do this. The semantics of
a text stream is that a control character appearing in it is a control character
that should be interpreted or stripped or used.  A control character that
is desired to be part of the data content (rightly or wrongly) should never
be sent directly: it is a mistake of XML 1.0 to allow direct C1 characters.

Ultimately, it comes down to a model of layering.  I believe the layering
is 
   applications and data stores
   -------------------------------------------------------------------------------
   Infoset data (can include controls not null)
   -------------------------------------------------------------------------------
   XML, which must be compatible with "textual" text/*  MIME
   -------------------------------------------------------------------------------
   text data being sent as a data stream, by some system using controls
   -------------------------------------------------------------------------------
   packets         
   -------------------------------------------------------------------------------

That is more the kind of old telnet/modem-ish model that the RFCs
have underlying them, and XML 1.1 supports this better than XML 1.0
does.   

The second prong that Tim raises is that in XML "the content is text"
(i.e. and not binary) by which he is suggesting that non-text data
should not be serialized as XML but first encoded using, say Bin64 notation.
Unfortunately, this currently requires some kind of schema processing 
and some kind of PSVI to extract the string: a lot of overhead for a little
feature. And the WXS Bin64 has a problem that there is no standard way to
say what the data is after it is decoded: what is its notation or MIME type?
So Bin64 can only be used with private conventions anyway.

As Richard comments, arbitrary binary data still cannot be sent, because
the U+0000 character NULL is not available in numeric character references.
If we have no objection to Bin64 encoded data content, I don't see the
problem with characters with controls as NCRs: both are textual and
opaque. 

>  And furthermore, the reason why our friends at Microsoft & IBM et al 
>  want this is so they can take filthy dirty data out of database fields 
>  and wrap XML tags around it and claim interoperability, which is pretty 
>  questionable. -Tim

As long as it is represented as text, why are the controls (when sanitized) any 
less filthy than the PUA characters?   I am all in favour of making XML more
comprehensive and more "textual" as a notation (in the terminology of the 
RFC for MIME types for text/*), and when this is still safe (no nulls), seems
to fit into Internet layers more, is more mainstream SGML-ish, *and* improves 
robustness no end (better encoding detection), it is a pretty credible package.   

Cheers
Rick Jelliffe

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org> , an
initiative of OASIS <http://www.oasis-open.org> 

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl> 
Thread:
John Cowan
John Cowan
Rick Jelliffe
John Cowan
Rick Jelliffe
Rick Jelliffe
Rick Jelliffe
Rick Jelliffe
Richard Tobin
Tim Bray
Richard Tobin
Tim Bray
Richard Tobin
John Cowan
G. Ken Holman
John Cowan
Elliotte Rusty Harold
Amelia A Lewis
John Cowan
Richard Tobin
John Cowan
Amelia A Lewis
John Cowan
John Cowan
Rick Jelliffe
Karl Waclawek
Karl Waclawek
Karl Waclawek
Elliotte Rusty Harold
John Cowan
Jeni Tennison
John Cowan
Karl Waclawek
Elliotte Rusty Harold
Elliotte Rusty Harold
Elliotte Rusty Harold
Daniel Veillard
Elliotte Rusty Harold
John Cowan
David Carlisle
John Cowan
David Megginson
Tim Bray
John Cowan
Daniel Veillard
Elliotte Rusty Harold
Elliotte Rusty Harold
Elliotte Rusty Harold
Elliotte Rusty Harold

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved