ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: Getting encoding declaration with XML::SAX
by Dominic Mitchell other posts by this author
Feb 28 2006 2:23PM messages near this date
view in the new Beta List Site
Getting encoding declaration with XML::SAX | Re: Getting encoding declaration with XML::SAX
& XSLT Timothy Appnel wrote:
>  I'm trying to write some code that will parse an XML document and
>  optionally reserialize it back. I have to expect character sets other
>  then utf-8 will be passed in. The problem I'm running into is getting
>  the declared encoding out of the XML while parsing with XML::SAX.
>  set_document_locator doesn't pick up the encoding declaration. I've
>  tried a couple of different parsers and feeds with different
>  encodings. Still no love. Here is a test script I'm running...
>  
>  #!/usr/bin/perl -w
>  use strict;
>  
>  use XML::SAX::ParserFactory;
>  my $handler = XML::SAX::Foo->new();
>  my $p = XML::SAX::ParserFactory->parser(Handler => $handler);
>  $p->parse_file('atom.xml'); # http://www.sixapart.jp/atom.xml
>  
>  package XML::SAX::Foo;
>  use base qw( XML::SAX::Base );
>  
>  use Data::Dumper;
>  sub set_document_locator { print Dumper(@_) }
>  
>  What am I do wrong here? Why is the declared encoding information so
>  difficult to determine?

The declared encoding is only used by the parser to interpret the 
incoming XML.  Once it's been parsed, the character data is all in 
Unicode (I thought).  So you should end up being given UTF-8 back.  From 
there, you can use Encode.pm to turn it into any charset you want 
(although I always recommend sticking with UTF-8 if possible).

-Dom
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Timothy Appnel
Dominic Mitchell
A. Pollock
Timothy Appnel
A. Pagaltzis
A. Pagaltzis

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved