The parsers do not validate the XML document. They do parse the
internal DTD and, at request, external DTD and external entities,
if you resolve the identifier of the external entities with the
-externalentitycommand script (see there).
Additionly, the Tcl extension code that implements this command
provides an API for adding C level coded handlers. Up to now, there
exists the parser extension command "tdom". The handler set
installed by this extension build an in memory "tDOM" DOM tree,
while the parser is parsing the input.
It is possible to register an arbitrary amount of different
handler scripts and C level handlers for most of the events. If the
event occurs, they are called in turn.
- -namespace
-
Enables namespace parsing. You must use this option while
creating the parser with the expat or xml::parser command. You can't enable (nor
disable) namespace parsing with <parserobj>
configure ....
- -final boolean
-
This option indicates whether the document data next presented
to the parse method is the final part of the document. A value of
"0" indicates that more data is expected. A value of "1" indicates
that no more is expected. The default value is "1".
If this option is set to "0" then the parser will not report
certain errors if the XML data is not well-formed upon end of
input, such as unclosed or unbalanced start or end tags. Instead
some data may be saved by the parser until the next call to the
parse method, thus delaying the reporting of some of the data.
If this option is set to "1" then documents which are not
well-formed upon end of input will generate an error.
- -baseurl url
-
Reports the base url of the document to the parser.
- -elementstartcommand script
-
Specifies a Tcl command to associate with the start tag of an
element. The actual command consists of this option followed by at
least two arguments: the element type name and the attribute
list.
The attribute list is a Tcl list consisting of name/value pairs,
suitable for passing to the array set Tcl command.
Example:
proc HandleStart {name attlist} {
puts stderr "Element start ==> $name has attributes $attlist"
}
$parser configure -elementstartcommand HandleStart
$parser parse {<test id="123"></test>}
This would result in the following command being invoked:
HandleStart text {id 123}
- -elementendcommand script
-
Specifies a Tcl command to associate with the end tag of an
element. The actual command consists of this option followed by at
least one argument: the element type name. In addition, if the
-reportempty option is set then the command may be invoked with the
-empty configuration option to indicate whether it is an empty
element. See the description of the -reportempty option for an
example.
Example:
proc HandleEnd {name} {
puts stderr "Element end ==> $name"
}
$parser configure -elementendcommand HandleEnd
$parser parse {<test id="123"></test>}
This would result in the following command being invoked:
HandleEnd test
- -characterdatacommand script
-
Specifies a Tcl command to associate with character data in the
document, ie. text. The actual command consists of this option
followed by one argument: the text.
It is not guaranteed that character data will be passed to the
application in a single call to this command. That is, the
application should be prepared to receive multiple invocations of
this callback with no intervening callbacks from other
features.
Example:
proc HandleText {data} {
puts stderr "Character data ==> $data"
}
$parser configure -characterdatacommand HandleText
$parser parse {<test>this is a test document</test>}
This would result in the following command being invoked:
HandleText {this is a test document}
- -processinginstructioncommand script
-
Specifies a Tcl command to associate with processing
instructions in the document. The actual command consists of this
option followed by two arguments: the PI target and the PI
data.
Example:
proc HandlePI {target data} {
puts stderr "Processing instruction ==> $target $data"
}
$parser configure -processinginstructioncommand HandlePI
$parser parse {<test><?special this is a processing instruction?></test>}
This would result in the following command being invoked:
HandlePI special {this is a processing instruction}
- -notationdeclcommand script
-
Specifies a Tcl command to associate with notation declaration
in the document. The actual command consists of this option
followed by four arguments: the notation name, the base uri of the
document (this means, whatever was set by the -baseurl option), the
system identifier and the public identifier. The notation name is
never empty, the other arguments may be.
- -externalentitycommand script
-
Specifies a Tcl command to associate with references to external
entities in the document. The actual command consists of this
option followed by three arguments: the base uri, the system
identifier of the entity and the public identifier of the entity.
The base uri and the public identifier may be the empty list.
This handler script has to return a tcl list consisting of three
elements. The first element of this list signals, how the external
entity is returned to the processor. At the moment, the three
allowed types are "string", "channel" and "filename". The second
element of the list has to be the (absolute) base URI of the
external entity to be parsed. The third element of the list are
data, either the already read data out of the external entity as
string in the case of type "string", or the name of a tcl channel,
in the case of type "channel", or the path to the external entity
to be read in case of type "filename". Behind the scene, the
external entity referenced by the returned Tcl channel, string or
file name will be parsed with an expat external entity parser with
the same handler sets as the main parser. If parsing of the
external entity fails, the whole parsing is stopped with an error
message. If a Tcl command registered as externalentitycommand isn't
able to resolve an external entity it is allowed to return
TCL_CONTINUE. In this case, the wrapper give the next registered
externalentitycommand a try. If no externalentitycommand is able to
handle the external entity parsing stops with an error.
Example:
proc externalEntityRefHandler {base systemId publicId} {
if {![regexp {^[a-zA-Z]+:/} $systemId]} {
regsub {^[a-zA-Z]+:} $base {} base
set basedir [file dirname $base]
set systemId "[set basedir]/[set systemId]"
} else {
regsub {^[a-zA-Z]+:} $systemId systemId
}
if {[catch {set fd [open $systemId]}]} {
return -code error \
-errorinfo "Failed to open external entity $systemId"
}
return [list channel $systemId $fd]
}
set parser [expat -externalentitycommand externalEntityRefHandler \
-baseurl "file:///local/doc/doc.xml" \
-paramentityparsing notstandalone]
$parser parse {<?xml version='1.0'?>
<!DOCTYPE test SYSTEM "test.dtd">
<test/>}
This would result in the following command being invoked:
externalEntityRefHandler file:///local/doc/doc.xml test.dtd {}
External entities are only tried to resolve via this handler
script, if necessary. This means, external parameter entities
triggers this handler only, if -paramentityparsing is used with
argument "always" or if -paramentityparsing is used with argument
"notstandalone" and the document isn't marked as standalone.
- -unknownencodingcommand script
-
Not implemented at Tcl level.
- -startnamespacedeclcommand script
-
Specifies a Tcl command to associate with start scope of
namespace declarations in the document. The actual command consists
of this option followed by two arguments: the namespace prefix and
the namespace URI. For an xmlns attribute, prefix will be the empty
list. For an xmlns="" attribute, uri will be the empty list. The
call to the start and end element handlers occur between the calls
to the start and end namespace declaration handlers.
- -endnamespacedeclcommand script
-
Specifies a Tcl command to associate with end scope of namespace
declarations in the document. The actual command consists of this
option followed by the namespace prefix as argument. In case of an
xmlns attribute, prefix will be the empty list. The call to the
start and end element handlers occur between the calls to the start
and end namespace declaration handlers.
- -commentcommand script
-
Specifies a Tcl command to associate with comments in the
document. The actual command consists of this option followed by
one argument: the comment data.
Example:
proc HandleComment {data} {
puts stderr "Comment ==> $data"
}
$parser configure -commentcommand HandleComment
$parser parse {<test><!-- this is <obviously> a comment --></test>}
This would result in the following command being invoked:
HandleComment { this is <obviously> a comment }
- -notstandalonecommand script
-
This Tcl command is called, if the document is not standalone
(it has an external subset or a reference to a parameter entity,
but does not have standalone="yes"). It is called with no
additional arguments.
- -startcdatasectioncommand script
-
Specifies a Tcl command to associate with the start of a CDATA
section. It is called with no additional arguments.
- -endcdatasectioncommand script
-
Specifies a Tcl command to associate with the end of a CDATA
section. It is called with no additional arguments.
- -elementdeclcommand script
-
Specifies a Tcl command to associate with element declarations.
The actual command consists of this option followed by two
arguments: the name of the element and the content model. The
content model arg is a tcl list of four elements. The first list
element specifies the type of the XML element; the six different
possible types are reported as "MIXED", "NAME", "EMPTY", "CHOICE",
"SEQ" or "ANY". The second list element reports the quantifier to
the content model in XML Syntax ("?", "*" or "+") or is the empty
list. If the type is "MIXED", then the quantifier will be "{}",
indicating an PCDATA only element, or "*", with the allowed
elements to intermix with PCDATA as tcl list as the fourth
argument. If the type is "NAME", the name is the third arg;
otherwise the third argument is the empty list. If the type is
"CHOICE" or "SEQ" the fourth argument will contain a list of
content models build like this one. The "EMPTY", "ANY", and "MIXED"
types will only occur at top level.
Examples:
proc elDeclHandler {name content} {
puts "$name $content"
}
set parser [expat -elementdeclcommand elDeclHandler]
$parser parse {<?xml version='1.0'?>
<!DOCTYPE test [
<!ELEMENT test (#PCDATA)>
]>
<test>foo</test>}
This would result in the following command being invoked:
test {MIXED {} {} {}}
$parser reset
$parser parse {<?xml version='1.0'?>
<!DOCTYPE test [
<!ELEMENT test (a|b)>
]>
<test><a/></test>}
This would result in the following command being invoked:
elDeclHandler test {CHOICE {} {} {{NAME {} a {}} {NAME {} b {}}}}
- -attlistdeclcommand script
-
Specifies a Tcl command to associate with attlist declarations.
The actual command consists of this option followed by five
arguments. The Attlist declaration handler is called for *each*
attribute. So a single Attlist declaration with multiple attributes
declared will generate multiple calls to this handler. The
arguments are the element name this attribute belongs to, the name
of the attribute, the type of the attribute, the default value (may
be the empty list) and a required flag. If this flag is true and
the default value is not the empty list, then this is a "#FIXED"
default.
Example:
proc attlistHandler {elname name type default isRequired} {
puts "$elname $name $type $default $isRequired"
}
set parser [expat -attlistdeclcommand attlistHandler]
$parser parse {<?xml version='1.0'?>
<!DOCTYPE test [
<!ELEMENT test EMPTY>
<!ATTLIST test
id ID #REQUIRED
name CDATA #IMPLIED>
]>
<test/>}
This would result in the following commands being invoked:
attlistHandler test id ID {} 1
attlistHandler test name CDATA {} 0
- -startdoctypedeclcommand script
-
Specifies a Tcl command to associate with the start of the
DOCTYPE declaration. This command is called before any DTD or
internal subset is parsed. The actual command consists of this
option followed by four arguments: the doctype name, the system
identifier, the public identifier and a boolean, that shows if the
DOCTYPE has an internal subset.
- -enddoctypedeclcommand script
-
Specifies a Tcl command to associate with the end of the DOCTYPE
declaration. This command is called after processing any external
subset. It is called with no additional arguments.
- -paramentityparsing
never|notstandalone|always
-
"never" disables expansion of parameter entities, "always"
expands always and "notstandalone" only, if the document isn't
"standalone='no'". The default ist "never"
- -entitydeclcommand script
-
Specifies a Tcl command to associate with any entity
declaration. The actual command consists of this option followed by
seven arguments: the entity name, a boolean identifying parameter
entities, the value of the entity, the base uri, the system
identifier, the public identifier and the notation name. According
to the type of entity declaration some of this arguments may be the
empty list.
- -ignorewhitecdata boolean
-
If this flag is set, element content which contain only
whitespaces isn't reported with the -characterdatacommand.
- -ignorewhitespace boolean
- Another name for -ignorewhitecdata; see
there.
- -handlerset name
-
This option sets the Tcl handler set scope for the configure
options. Any option value pair following this option in the same
call to the parser are modifying the named Tcl handler set. If you
don't use this option, you are modifying the default Tcl handler
set, named "default".
- -noexpand boolean
-
Normally, the parser will try to expand references to entities
defined in the internal subset. If this option is set to a true
value this entities are not expanded, but reported literal via the
default handler. Warning: If you set this option to true and
doesn't install a default handler (with the -defaultcommand option)
for every handler set of the parser all internal entities are
silent lost for the handler sets without a default handler.
- -useForeignDTD <boolen>
- If <boolen> is true and the document does not have an
external subset, the parser will call the -externalentitycommand
script with empty values for the systemId and publicID arguments.
This option must be set, before the first piece of data is parsed.
Setting this option, after the parsing has started has no effect.
The default is not to use a foreign DTD. The default is restored,
after reseting the parser. Pleace notice, that a
-paramentityparsing value of "never" (which is the default)
suppresses any call to the -externalentitycommand script. Pleace
notice, that, if the document also doesn't have an internal subset,
the -startdoctypedeclcommand and enddoctypedeclcommand scripts, if
set, are not called.
A script invoked for any of the parser callback commands, such
as -elementstartcommand, -elementendcommand, etc, may return an
error code other than "ok" or "error". All callbacks may in
addition return "break" or "continue".
If a callback script returns an "error" error code then
processing of the document is terminated and the error is
propagated in the usual fashion.
If a callback script returns a "break" error code then all
further processing of every handler script out of this Tcl handler
set is suppressed for the further parsing. This does not influence
any other handler set.
If a callback script returns a "continue" error code then
processing of the current element, and its children, ceases for
every handler script out of this Tcl handler set and processing
continues with the next (sibling) element. This does not influence
any other handler set.