ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: Getting encoding declaration with XML::SAX
by Timothy Appnel other posts by this author
Mar 1 2006 9:58AM messages near this date
view in the new Beta List Site
Re: Getting encoding declaration with XML::SAX | Re: Getting encoding declaration with XML::SAX
& XSLT Thanks for the replies. I knew about xml_decl, but I also read that
its deprecated so I counted it out. I was under the impression the a
replacement was included in SAX 2.1. The Encoding key in
set_document_locator was the closest thing I could find.

Nevertheless, I ended recifying the problem by converting anything
that isn't UTF-8 before parsing. I do a quick regex to check if
encoding has been declared to know if I need to do a conversation and
if so, what I'm converting from. So all documents get run through the
parser as UTF-8 and output as UTF-8.

I would have liked to maintain the encoding, but a Japanese speaking
colleague of mine convinced me that the encoding doesn't really matter
to the end user as long is it can be parsed and the content doesn't
get mangled.

For anyone who's interested the whole system is in CPAN already.

http://search.cpan.org/~tima/XML-Atom-Syndication-0.9_07/

More specifically the conversion process I detailed happens in the
init method of XML::Atom::Syndication::Thing.

Thanks again to all who replied.

<tim/> 

On 2/28/06, A. Pollock <flipomatic@[...].com>  wrote:
>  This works for me:
> 
>  sub xml_decl {
>     my ($self, $data) = @_;
>     print $data->{Encoding}, "\n";
>  }
> 
>  Arvin
> 
>  > ----- Original Message -----
>  > From: "Timothy Appnel" <tappnel@[...].com>
>  > To: perl-xml@[...].com
>  > Subject: Getting encoding declaration with XML::SAX
>  > Date: Tue, 28 Feb 2006 16:08:01 -0500
>  >
>  >
>  > I'm trying to write some code that will parse an XML document and
>  > optionally reserialize it back. I have to expect character sets other
>  > then utf-8 will be passed in. The problem I'm running into is getting
>  > the declared encoding out of the XML while parsing with XML::SAX.
>  > set_document_locator doesn't pick up the encoding declaration. I've
>  > tried a couple of different parsers and feeds with different
>  > encodings. Still no love. Here is a test script I'm running...
>  >
>  > #!/usr/bin/perl -w
>  > use strict;
>  >
>  > use XML::SAX::ParserFactory;
>  > my $handler = XML::SAX::Foo->new();
>  > my $p = XML::SAX::ParserFactory->parser(Handler => $handler);
>  > $p->parse_file('atom.xml'); # http://www.sixapart.jp/atom.xml
>  >
>  > package XML::SAX::Foo;
>  > use base qw( XML::SAX::Base );
>  >
>  > use Data::Dumper;
>  > sub set_document_locator { print Dumper(@_) }
>  >
>  > What am I do wrong here? Why is the declared encoding information so
>  > difficult to determine?
>  >
>  > <tim/>
>  > --
>  > Timothy Appnel
>  > http://www.timaoutloud.org/
>  >
>  > _______________________________________________
>  > Perl-XML mailing list
>  > Perl-XML@[...].com
>  > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
> 
>  >
> 
> 
>  --
>  _______________________________________________
> 
>  Search for businesses by name, location, or phone number.  -Lycos Yellow Pages
> 
>  http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp?SRC=lycos10
> 
> 


--
Timothy Appnel
http://www.timaoutloud.org/

_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Timothy Appnel
Dominic Mitchell
A. Pollock
Timothy Appnel
A. Pagaltzis
A. Pagaltzis

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved