RE: What's So Great about SAX? (ie. Future Indecisions)
by Grant McLean other posts by this author
Oct 7 2002 11:03PM messages near this date
view in the new Beta List Site
Re: Generating PDF Files Dynamically
|
Re: What's So Great about SAX? (ie. Future Indecisions)
From: Morbus Iff [mailto:morbus@[...].com]
> >SAX is a huge advance over the XML::Parser Handler API for a
> =20
> Thanks for the expose. I'll wax a bit.
> =20
> > - pluggable - if your code is written to the SAX API you
> > can use any SAX parser without changing your code
> =20
> Not immediately useful to me, since expat is the only library=20
> currently=20
> ported (and bundle-able) to every OS I need it to be in (my=20
> software is one=20
> of the rare few that turns into a "don't need Perl installed"=20
> application=20
> for Mac and Windows).
Note also that XML::SAX comes with an extremely portable parser
written entirely in Perl (XML::SAX::PurePerl). Unfortunately,
it needs Perl 5.8 to support encodings other than UTF8
> > - flexible - your data source does not even need to be an
> > XML document (eg: you can drive your SAX pipeline from
> > a database query
> =20
> That's kinda neat, although I do some pre-processing before=20
> sending to=20
> XML::Simple - enough so that I'm always sending a string of=20
> XML, not a file.
> =20
> >I'm not entirely clear on what you're trying to do with
> >namespaces. Do you want your hashref keys to be in Clarkian
> >notation eg: '{http://purl.org/dc/elements/1.1/}date' or
> >do you want to normalise the prefixes used eg: 'dc:date'?
> =20
> Nope, not really. The biggest problem is:
> =20
> - I assume my data is going to be in one data structure, but
> if someone prefixes the data with a namespace besides the
> implied default, I get a different structure that breaks
> my assumption:
> =20
> assuming:
> <item><dc:title>boo</dc:title></item>
> =3D=3D $item->{dc:title}
> =20
> breaks my thingy:
> <item><dublincore:title>boo</dublincore:title></item>
> !=3D $item->{dc:title} but rather $item->{dublinecore:title}
That does look like the normalisation option I referred to.
So if you got a document like this:
<rdf:RDF
xmlns=3D"http://purl.org/rss/1.0/"
xmlns:rdf=3D"http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:theonetruedublincore=3D"http://purl.org/dc/elements/1.1/" >
<theonetruedublincore:date> 2002-10-08</theonetruedublincore:date>
</rdf:RDF>
then you want to treat it as if it were:
<rdf:RDF
xmlns=3D"http://purl.org/rss/1.0/"
xmlns:rdf=3D"http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc=3D"http://purl.org/dc/elements/1.1/" >
<dc:date> 2002-10-08</dc:date>
</rdf:RDF>
and slurp it into a hash like this:
{
'xmlns' =3D> 'http://purl.org/rss/1.0/',
'xmlns:rdf' =3D> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'xmlns:dc' =3D> 'http://purl.org/dc/elements/1.1/',
'dc:date' =3D> '2002-10-08'
};
The way I'd see that working with SAX is something like:
use XML::SAX::Machines qw( :all );
use XML::Filter::NSNormalise;
use XML::Simple;
my $p =3D Pipeline(
XML::Filter::NSNormalise-> new(
map =3D> {=20
'http://purl.org/dc/elements/1.1/' =3D> 'dc',
'http://purl.org/rss/1.0/modules/syndication/' =3D> 'syn'
}
)
=3D> XML::Simple->new(
keyattr =3D> {}
)
);
my $ref =3D $p-> parse_uri('./rss.xml');
An off-the-cuff version of XML::Filter::NSNormalise is attached.
Cheers
Grant
Attachments:
NSNormalise.pm
Thread:
Grant McLean
Robin Berjon
|