ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: enabling utf8 encoding for sax parer XML::SAX::Expat
by Michael Ludwig other posts by this author
Apr 29 2009 1:19AM messages near this date
view in the new Beta List Site
Re: enabling utf8 encoding for sax parer XML::SAX::Expat | Re: enabling utf8 encoding for sax parer XML::SAX::Expat
& XSLT eyal edri schrieb:

>  can it be that since i'm using LWP::UserAgent , perl somehow changes
>  the encoding while d/l?

Just ran into this myself. I think LWP does that, not Perl. And it may
even be the correct thing to do, given that what you send over the wire
are octets, not strings. See the note in the perldoc for HTTP::Request:

| $r-> content( $bytes )
|
| Note that the content should be a string of bytes. Strings in perl can
| contain characters outside the range of a byte. The Encode module can
| be used to turn such strings into a string of bytes.

So it is documented, just rather concisely, not very verbosely. The
relevant concepts are explained elsewhere. See perldoc:

* perluniintro
* perlunitut
* perlunicode
* perlunifaq

>  $userAgent->request($request, $file) ?
> 
>  should i add any sort of HTTP header with encoding like the ones
>  returned in HTTP responses?

You should indicate the encoding in the Content-Type, yes.

The issue here, however, is unrelated to that. Use the Encode module
to convert your characters to octets. This yields a string of octets,
rather than a string of characters. In the following sample snipped,
I convert the output of an XSL transformation, which is a string of
UTF-8 characters, into a string of octets, which is what needs to go
over the wire. This string of octets has the UTF-8 flag unset, which
can be checked by using Encode::is_utf8().

   my $octets = encode( 'utf8', $ss-> output_string( $result));
   my $req = HTTP::Request-> new( POST => $url); 

   $req-> content_type('text/xml; charset=utf-8');
   $req-> content( $cont );
   my $res = $ua-> request( $req);
   if ( ! $res-> is_success ) {
       print STDERR $res-> status_line, "\n";
   }

Michael Ludwig
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Eyal Edri
Michael Ludwig
Robin Berjon
Eyal Edri
Michael Ludwig
Eyal Edri
Michael Ludwig
Eyal Edri
Robin Berjon
Michael Ludwig
Michael Ludwig
Robin Berjon
Eyal Edri
Eyal Edri
Robin Berjon

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved