[xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
by John Cowan other posts by this author
Jun 11 2002 5:31PM messages near this date
Re: [xml-dev] W3C Schema: Resistance is Futile, says Don Box
|
Re: [xml-dev] Come On, DTD, Come On! Thoughts on DSDL Part 9
ISO/IEC 19757-9 is currently an empty hole titled "Datatype- and
namespace-aware DTDs". This is a ragbag of ideas to fill that hole.
I am assuming that the context for extending DTDs is not redefining
XML, but rather creating an enhanced XML DTD format which can be used by an
external validator. Examples of existing external validators are Jing
for RELAX NG, XSV for W3C XML Schema, and Sun MSV for many different
schema formats (including DTDs). Enhanced DTDs would not be acceptable
to XML validating parsers.
I think the following enhancements to standard XML DTDs are worth
considering. They are directed to making DTD authoring easier and
more flexible. Nothing is introduced that is beyond the current schema
language state of the art.
1) The NS declaration. Declarations of the form <!NS name SYSTEM "uri">
are allowed to define the namespaces associated with CNames in ELEMENT
and ATTLIST declarations. As is the case for other schema languages, in
the presence of a known prefix, name matching is done on the universal
name (URI + local-part) rather than the CName. The default namespace
is declared using #DEFAULT in place of the name.
Example:
<!NS foo SYSTEM "http://www.example.com/foo">
<!NS bar SYSTEM "http://www.example.com/bar">
<!ELEMENT foo:a (foo:b)>
<!ELEMENT bar:a EMPTY>
Issue: Is it an error to mention a prefix that is not declared? My
answer: no; if this is done, name matching falls back to string identity.
Issue: is the keyword SYSTEM useful?
Issue: this does not help when prefixes are not used consistently
throughout an instance. Do we care? My answer: no.
2) Attribute data types. The names that can appear in an ATTLIST
declaration directly after an attribute name are extended to include
the datatype names of part 5 (i.e. XSD simple types).
Example:
<!ATTLIST baz
foo integer #implied
baz integer #required>
Issue: do we need to make the datatype list extensible? If so, we could
use QNames and a DATATYPE declaration, rather like the compact syntax
of RELAX NG.
3) Element simple datatypes. Likewise, unparenthesized content models
in ELEMENT declarations are extended from just ANY and EMPTY to include
these same datatypes.
Example: <!ELEMENT foo nonNegativeInteger>
4) Datatype lists. In either #2 or #3 context, a simple datatype name
can be replaced by "LIST(name)" to indicate a whitespace-separated
list of strings matching the datatype. IDREFS is equal to LIST(IDREF),
and ENTITIES is equal to LIST(ENTITY).
5) Datatype choice. In either #2 or #3 context, a simple or LIST-wrapped
datatype name can be replaced by |-separated names, to indicate a choice
(derivation by union in WXS terms).
Example: <!ELEMENT bar integer|name>
Issue: what do we do about XSD facets? They are important but don't
easily fit into the rigid DTD syntax.
6) Restore & connector. Bring back the & connector, either with the
SGML semantics (A,B)|(B,A), or preferably with the RELAX NG "interleave"
semantics. The difference is that, given the content model "A & B+",
the element sequences A, B, B, B and B, B, B, A will match in either case,
but B, A, B, B will only match using interleave semantics.
Issue: SGML or interleave? My answer: interleave
7) Abandon SGML 1-ambiguity rules. Instead, allow complete flexibility of
content models. See James Clark's discussion in "The Design of RELAX NG".
8) Restore multiple element and attribute names separated by |s.
This makes for conciseness and easy authoring. These constructs were
dumped in XML DTDs because they imposed extra cost on validating parsers,
but in this model validation is something done outside parsing, so higher
cost is worthwhile.
9) Fixed element content. Allow ELEMENT declarations to specify "#FIXED
'value'" after a datatype.
Example: <!ELEMENT foo integer #FIXED "5">
This means that the content of any foo element must be equivalent to 5
according to the "integer" datatype's equivalence relation: therefore,
05, 005, +5, etc. will pass validation.
General issue: Should there be some way to indicate candidate roots?
In existing DTDs, any element can be a root.
General issue: We need to figure out what to do if the instance contains
an internal DTD (by which I mean an internal subset, a reference to an
external subset, or both). Should internal validation be required,
permitted, or forbidden when doing external validation? (I take it
for granted that if it is to be done, it will be done in the parser,
i.e. first.) What is the effect of attribute defaulting specified by
the internal DTD on the external validation process? internal validation
be done before external validation or turned off
--
John Cowan <jcowan@[...].com> http://www.reutershealth.com
I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org> , an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
Thread:
John Cowan
Marcus Carr
Eric Bohlman
Marcus Carr
james anderson
james anderson
james anderson
Marcus Carr
Arjun Ray
Marcus Carr
Arjun Ray
John Cowan
Arjun Ray
John Cowan
Arjun Ray
John Cowan
Arjun Ray
John Cowan
Deborah Aleyne Lapeyre
John Cowan
Thomas B. Passin
Ronald Bourret
Ronald Bourret
Michael Kay
Thomas B. Passin
james anderson
David Carlisle
james anderson
David Carlisle
james anderson
David Carlisle
james anderson
Michael Kay
james anderson
David Carlisle
Tim Bray
Ronald Bourret
Ronald Bourret
Ronald Bourret
Arjun Ray
John Cowan
Arjun Ray
John Cowan
Arjun Ray
John Cowan
John Cowan
james anderson
John Cowan
Rick Jelliffe
Arjun Ray
John Cowan
Rick Jelliffe
Rick Jelliffe
Dennis Sosnoski
John Cowan
Dennis Sosnoski
John Cowan
Dennis Sosnoski
Arjun Ray
G. Ken Holman
John Cowan
Arjun Ray
james anderson
Arjun Ray
John Cowan
Arjun Ray
Rick Jelliffe
John Cowan
Arjun Ray
John Cowan
John Cowan
james anderson
John Cowan
james anderson
james anderson
John Cowan
james anderson
james anderson
John Cowan
Ronald Bourret
Ronald Bourret
Jonathan Borden
Ronald Bourret
Michael Fuller
John Cowan
Bob Hutchison
james anderson
Thomas B. Passin
John Cowan
Ronald Bourret
John Cowan
Thomas B. Passin
Ronald Bourret
Ronald Bourret
james anderson
Norman Walsh
K. Ari Krupnikov
John Cowan
John Cowan
K. Ari Krupnikov
John Cowan
G. Ken Holman
Ronald Bourret
Rick Jelliffe
John Cowan
Marcus Carr
G. Ken Holman
John Cowan
Michael Fitzgerald
Paul Prescod
John Cowan
John Cowan
|