XML::AutoWriter, XML::Filter::Validator?
by Barrie Slaymaker other posts by this author
Jul 25 2000 12:58PM messages near this date
view in the new Beta List Site
[ANNOUNCE] XML::ValidWriter, XML::AutoWriter 0.1
|
Re: XML::Twig '0' PCDATA erased?
[[CCed back to perl-xml for general commentary]]
Laurent CAPRANI wrote, in part:
>
> 1. SAX handler
>
> I don't need personally SAX handler methods. Only the startTag() / endTag()
> interface may be useful for me, since I will surely override them.
I misunderstood, then: I thought you wanted to use your existing
SAX driver chain and bolt on an XML::ValidWriter::SAX handler
object.
> 2. SAX driver
>
> Well, I understand you may feel reluctant toward PerlSAX design. I would surely
> agree with any attempt to improve PerlSAX performance.
I'm reluctant to force SAX where it's not a good fit, is all. SAX is
a Good Thing (tm), and I also want to be SAX compliant. Probably
by refactoring the code a bit in to XML::Validator and XML::Validator::SAX
or XML::Filter::Validator. Naming suggestions welcome. It's not
going to happen immediately, though, too much else to do.
> 3. Specifying autotag attributes
The current this works is:
# AutoWriter subclasses ValidWriter and provides autotagging
$writer = XML::AutoWriter-> new(
DOCTYPE => new XML::Doctype( 'foo', SYSTEM_ID => 'fooml.dtd' )
) ;
## fooml.dtd contains <!ATTLIST foo a1 CDATA #REQUIRED > . fooml.dtd
## may or may not have an <!ELEMENT foo ... > in it.
$writer-> getDoctype->element_decl('foo')->default_on_write('value') ;
Haven't done anything about callbacks.
> 4. Specifying the autotag path
>
> I suppose that pre-code() gets called when startTag(<c>) is called within an
> <a>. It allows the user to specify the path (here <b>).
Yup. It would be a pattern spec w/ callbacks that allow you to emit
preamble, alternative content, and/or postamble.
> My idea was to specify more automation through the DTD side. For example a
> special kind of attribute for path selection.
> For example, the DTD allows a <P> inside <BODY>, inside a <TABLE><ROW><CELL> or
> inside a <LIST><ITEM>.
> The "extended" DTD would require an additional qualifier for <P> (similar to a
> required attribute), to select the right path.
If I understand, you want to call something like
$writer-> startTag( 'P', NESTED_IN => 'TABLE' ) ;
. Right now, you can hardwire path selection in the subclass, as above,
or by calling startTag() with the desired intermediate tag:
$writer-> startTag( 'TABLE' ) ;
$writer-> startTag( 'P' ) ;
or, if you're using the functional interface,
TABLE ;
P ;
. An example lies below. How does having nesting specified as an attribute
(if I've understood correctly) help?
> 5. SGML-ish things
>
> I thought that inhibiting some autotagging may facilitate debugging and provide
> "default" path.
Would something like
$writer-> getDoctype->element_decl( 'foo' )->autoTagging( 0 ) ;
be Ok to shut it off? It doesn't 'extend' the DTD syntax, but it would work.
It would cut off the autotag search whenever an element with this set to
TRUE was encountered.
> The XSLTish thing looks promising. I should try it before requesting special
> features on the DTD side.
>
> Just an idea: XSLT-ish patterns could be extended to trigger callbacks on
> character data (inserting Perl regexps into patterns?).
Interesting thought. > >TODO
> 6. Conditional tags
>
> Converters from flat formats need to say "open this element if it is not
> already opened" and "close this element unless it is already closed" and they
> use it a lot.
The autotagger does something like that, but does not allow you to explictly
conditionally open a tag (example follows). Here're some possible OO &
functional APIs, what do you think?
1)
$writer-> startTag( 'P') unless $writer->tagOpen( 'P' ) ;
$writer-> endTag( 'P') if $writer->tagOpen( 'P' ) ;
P unless tagIsOpen( 'P' ) ;
end_P if tagIsOpen( 'P' ) ;
2)
$writer-> ensureStartTag( 'P' ) ;
$writer-> ensureEndTag( 'P' ) ;
ensure_P ;
ensure_end_P ;
3)
$writer-> assertStartTag( 'P' ) ;
$writer-> assertEndTag( 'P' ) ;
assert_P ;
assert_end_P ;
4)
$writer-> condStartTag( 'P' ) ;
$writer-> condEndTag( 'P' ) ;
condStartTag( 'P' ) ;
condEndTag( 'P' ) ;
. Here's a toy example to help illustrate how it behaves now:
[barries@jester XML-DocType]$ make pure_all ; perl toy
<?xml version="1.0"?>
<HTML> 0<TABLE><TR><TD><P>a</P><P>bc</P></TD></TR></TABLE></HTML>
######################################################################
#!/usr/local/bin/perl -w
use XML::Doctype NAME => 'HTML', DTD_TEXT => <<TOHERE ;
<!-- HTML is undefined and this it's cm is assumed to be 'ANY' -->
<!ELEMENT TABLE ( TR )* >
<!ELEMENT TR ( TD )* >
<!ELEMENT TD ( P )* >
<!ELEMENT P (#PCDATA) >
TOHERE
use XML::AutoWriter qw( :all :dtd_tags ) ;
xmlDecl ;
characters( '0' ) ;
TABLE ;
characters( 'a' ) ;
P ;
characters( 'b' ) ;
characters( 'c' ) ;
endAllTags ;
open( ME, "<$0" ) or die $! ;
print "\n", "#" x 70, "\n", <ME> ;
> There must be some way to provide this. It might be a query on open elements.
It's pretty easy to do that, and will be easier if I factor XML::Validator out
of XML::ValidWriter.
- Barrie
|