ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl5-porters
perl5-porters
Advance warning of tweaks to Encode API.
by Nick Ing-Simmons other posts by this author
Jan 30 2002 6:02PM messages near this date
Re: CPP insanity | MacOS X Issues
Nick Ing-Simmons <nick@[...].net>  writes:
>  >   You can use t/table.euc under Jcode module for instance.  table.utf8
>  > in my code example is just a utf8 version thereof. That's a data which
>  > contains all characters defined in EUC (well, actually JISX0212 is not
>  > included but very few environments can display JISX0212).
> 
> It is realy great to have some valid data!
> 
> For a start it has found a bug in :encoding layer - knew there must be some...
> (I think I have rediscovered the multi-byte char spanning buffer boundary
> bug ... which I could not reproduce before)

That is it - :encoding needs some serious re-work for any encoding
which will winge about partial characters (8-bit never does, and 16-bit
is unlikely to with even-length buffers - but multi-bytes can.
But since layers are much more stable now it can be recoded in a
better manner anyway.

To do that it needs to know why encode/decode stopped - did they "fail"
or just "pause" ? So  -> decode and ->encode methods are going to get tweaked
as hinted at in the existing pod.

I am currently leaning towards allowing "check" to be a reference
something like :

$uni = $enc-> decode($octets);        # best attempt + replacement chars
$uni = $enc-> decode($octets,0);      # croak on error ?
$uni = $enc-> decode($octets,1);      # stop on error
$uni = $enc-> decode($octets,\$err);  # stop on error reason code in $err
$uni = $enc-> decode($octets,\&foo);  # Call foo on error - protocol TBD

I need to think through a sane set of "numeric" check options perhaps
a "mask" of which errors are croak/replace/stop/ignored ?

I think you can deduce something from return value as well,
e.g. returns +ve length but does not consume whole string
     then that is result so far. TO find out why
     call it again - undef means no representation
                   - defined but zero length means partial char
                   - +ve length meant we had run out of room
                     (does not occur at perl level as SV can grow...)




--
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved