ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl5-porters
perl5-porters
Encode and emitting the little endian form of UTF-16 (not UTF-16LE)
by Demerphq other posts by this author
May 23 2007 8:53AM messages near this date
Re: The State of BigNumber support in Perl | Re: Encode and emitting the little endian form of UTF-16 (not UTF-16LE)
Hi Dan,

I was wondering if there is some way to get Encode to emit the little
endian version of UTF-16 (with BOM) as a typical Win32 on Intel app
would do. It seems to me that currently

my $octets= encode('UTF-16',$string);

will only emit the big-endian form of it.

Of course well behaved apps shouldnt care, but some do, also i know I
can hand emit the BOM myself like so:

my $octets= encode('UTF-16LE',chr(0xFEFF).$string);

but this strck me as a bit convoluted and makes it a bit tricky to do
with IO layers. If there isnt a way to do it currently maybe the name
'UTF-16:le' or something similar could be used for this?

Also it looks like there is a typo in the quick reference table of
Encode::Unicode:

    Quick Reference
                        Decodes from ord(N)           Encodes chr(N) to...
               octet/char BOM S.P d800-dfff  ord >  0xffff     \x{1abcd} ==
          ---------------+-----------------+------------------------------
          UCS-2BE       2   N   N  is bogus                  Not Available
          UCS-2LE       2   N   N     bogus                  Not Available
          UTF-16      2/4   Y   Y  is   S.P           S.P            BE/LE
          UTF-16BE    2/4   N   Y       S.P           S.P    0xd82a,0xdfcd
          UTF-16LE      2   N   Y       S.P           S.P    0x2ad8,0xcddf
          UTF-32        4   Y   -  is bogus         As is            BE/LE
          UTF-32BE      4   N   -     bogus         As is       0x0001abcd
          UTF-32LE      4   N   -     bogus         As is       0xcdab0100
          UTF-8       1-4   -   -     bogus   > = 4 octets   \xf0\x9a\af\8d
          ---------------+-----------------+------------------------------

Shouldnt UTF-16LE also be 2/4 like the other UTF-16 variants?

cheers,
yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread:
Demerphq
Tels
Demerphq
Tels
Demerphq
Tels

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved