ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> xml-dev
xml-dev
Re: Why the Infoset?
by Rick JELLIFFE other posts by this author
Jul 29 2000 11:50AM messages near this date
Why the Infoset? | Re: Why the Infoset?
"W. E. Perry" wrote:
>  
>  "Paul W. Abrahams" wrote:
 
>  > Looking at it another way, how would the XML world be poorer if the Infoset did
>  > not exist?
>  
>  It is far worse than that, I fear. The Infoset is the cuckoo's egg in the XML nest.
>  The fundamental innovation of XML 1.0 was the concept of well-formedness, which as
>  a radical insight amounts to this: the instance text--that is, content plus
>  markup--is entirely self-sufficient both as syntax and as the basis for derived or
>  elaborated semantics.

I disagree. The basis of SGML'86 was that the rooted, directed, cyclic
graph with 
attribute-value tree framework that allowed a handful of general
distinctions
on the edges (child, parent, next, attribute, IDREF, etc) and a handful
of general types
on the named nodes (element, comment, PI, etc) was, when coupled with a
simultaneous rooted
graph of entities with a handful of general types (NDATA, sgml), was
sufficient for
an enormous number of complex problems.  On top of this information
model, the
need to cope with an enormous number of possible notations and syntaxes.

XML's WF-only is not "entirely" self-sufficient for anything to do with
graphs.

(The problem is only fixed by hard-coding, i.e. XLink, and hardcoding
requires universal
names, i.e. Namespaces using some public-identifer-like registration
system, i.e. URIs)
What XML did was to say that a lot of users only need simple AV-trees,
so lets allow
them to have them with little fuss. And it said that people could agree
on a syntax.

>  Since no DTD nor other content
>  model or pre-ordained schema is required for the parsing, and therefore the
>  interpretation, of the resulting instance document, it is not necessary to secure
>  anyone's agreement to the extension of the content model before simply extending
>  the markup vocabulary of the instance document. 

I think this is too hard on schemas: what a schema does, in part, is
specify which
additional constraints the document has more than WF XML. These
constraints allow
more optimal handling of data: if I know that my content model is
(a,b,c)+ and 
that it is closed, then I can allocate a list with three slots 
for them and I know that the XPath  a[position()=1]  on the parent will
always succeed.
If I know that an element is a date, I can store it in a database as a
date not a
string. If I know that a value or combinations of value is unique, I can
use them
as keys for faster access to data. 

The idea of a syntax with no schema/DTD is hardly new: in part, it was
the infelicities
of these that caused SGML'86 to take such a strong and radical view: if
anyone can put any element
anywhere, how can a consumer contractually require an information
producer to produce
certain information?  LISP or ADA or any of the languages with
position-independent
and nameable parameters provides the same basic capabilities as XML WF:
why are they not
good markup languages?   It is the ability to constrain data by schemas
that is the 
key.

If data were all atomic, and each datum was described by a universal
name, and no two documents 
were similar, then I think
William's view would be pretty close to the mark: documents could be ad
hoc assemblages
of elements used by applications which handled each document as best it
could, perhaps
with the aid of private schemas to check that all the information
required was present.
But truth is not atomic: a number may be complex, a quantity will have a
unit as well as
a value, a table has rows, love and marriage go together like a horse
and carriage.

The other problem with William's view is the idea that documents don't
exist in fairly
similar runs: document types.  A schema is a way for the generator of
the document to communicate to the consumer of the document to tell them
the rules they have used.   A good schema 
language can allow the consumer of the document to know such things as
 * "If I delete element X here, should I expect all other systems who
are in the loop to still process the document correctly?"
 * "If I add element Y here, will that break other people's systems?"
 * "Do I really need to check that condition Z holds at this point, or
can I trust the generic contract-checking system?"

Rick Jelliffe
Thread:
Paul W. Abrahams
Rick JELLIFFE
W. E. Perry

Jonathan Borden
Simon St.Laurent
Jonathan Borden
Simon St.Laurent
John F. Schlesinger
Jonathan Borden
Simon St.Laurent
W. E. Perry
John Cowan
Rick JELLIFFE
Rick JELLIFFE
Sean McGrath
Simon St.Laurent
Jonathan Borden
Sean McGrath
Rick JELLIFFE
Rick JELLIFFE
Simon St.Laurent
James Robertson
Simon St.Laurent
Jonathan Borden
Simon St.Laurent
Paul W. Abrahams
Jonathan Borden
Paul W. Abrahams
Rick JELLIFFE
Dan Vint
Rick JELLIFFE
Marcus Carr
Michael Champion
John Cowan
John Cowan
John Cowan
Michael Champion
Winchel 'Todd' Vincent, III
John Cowan
Jonathan Borden
sam th
Jonathan Borden

Simon St.Laurent
John Cowan
John Cowan
John Cowan
Simon St.Laurent
Richard Lanyon
John Cowan
Jonathan Borden
John Cowan
Simon St.Laurent
John Cowan
Jonathan Borden
Rick JELLIFFE
james anderson
Winchel 'Todd' Vincent, III
Winchel 'Todd' Vincent, III
Rick JELLIFFE

Norman Walsh
Jonathan Borden
Winchel 'Todd' Vincent, III
Jonathan Borden
Norman Walsh
Winchel 'Todd' Vincent, III
Amy Lewis

Eric Bohlman

John Cowan
Simon St.Laurent
Jeff Greif
Jonathan Borden
Elliotte Rusty Harold
Sean McGrath
Simon St.Laurent
Joe English
Simon St.Laurent
Jonathan Borden
Simon St.Laurent
W. E. Perry
Jonathan Borden
John Cowan
John Cowan
Sean McGrath
W. E. Perry
John F. Schlesinger
Sean McGrath
Michael Champion
Michael Champion
Paul W. Abrahams
John Cowan
Paul W. Abrahams
Paul W. Abrahams
Simon St.Laurent
Martin Gudgin
Jonathan Borden
Simon St.Laurent
Tim Bray
Jonathan Borden
Jack Rusher
Steve Rowe

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved