Perl, XML and UTF-8
by Claude Paroz other posts by this author
May 29 2006 3:22PM messages near this date
view in the new Beta List Site
Re: Data with lack of some bits when using PerlSAX
|
Re: Perl, XML and UTF-8
& XSLT Hi,
I have some Perl (5.8.7) code that read XML (UTF-8 encoded), with
XML::Simple or XML::LibXML, and write content back to a HTML Page
through CGI.
Snippet :
use XML::LibXML;
use CGI qw/:standard/;
use Locale::gettext;
my $q = new CGI;
my $xml = XML::LibXML-> new();
my $data = xml-> parse_file($xmlfile);
my $root = $data-> getDocumentElement;
my @lines = $root-> getElementsByTagName('sometag');
print $q-> header(-type=>'text/html', -charset=>'UTF-8',
-encoding=> "UTF-8");
print $q-> start_html(-title => gettext("My title")),
-encoding=> "UTF-8");
print
$q-> h1($lines->getElementsByTagName('subtag')->item(0)->textContent);
print $q-> end_html;
************* End of Code ***************
My problem is that special characters (accented letters) aren't well
encoded when passed to the HTML output. Each special char is represented
by a question mark inside a square. However, the utf8::is_utf8 function
return 1 for these strings.
I also noted that when some special characters are in a string in the
XML file (e.g. â?¢ (trademark)), the encoding is also OK in the resulting
HTML. Weird...
What could be the problem?
Regards.
Claude
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Claude Paroz
Tim Brody
Claude Paroz
Andrey Alakozov
Suneet Agera
Dominic Mitchell
Attila Fülöp
Attila Fülöp
$Bill Luebkert
Vikasumit
|