Re: Continuing Perl 5.8.0 Problems
by Steve Hay other posts by this author
Oct 2 2002 4:30PM messages near this date
view in the new Beta List Site
Continuing Perl 5.8.0 Problems
|
Re: Continuing Perl 5.8.0 Problems
Petr Pajas wrote:
> Steve Hay <steve.hay@[...].com> writes:
>
>
> >Thus,
> >
> > $str = decode('utf8', $str);
> >
> >
> no, I hope I wrote this is supposed to read the UTF-8 encoded octet,
> check if it is a valid UTF-8 encoded string and turn the UTF8 flag ON.
>
> if you want to turn it OFF for a UTF-8 encoded string, you simply use
>
> $str = encode('utf8', $str);
>
> It takes the input string with UTF-8 flag on and "encodes" it into a
> UTF-8 octet, which effectively means that it only takes the UTF-8
> flag out.
>
I've now tried this as well: initially it croaked with the error "can't
convert!", presumably on one of the strings that was not flagged UTF-8
to start with.
So then I tried:
$str = encode('utf8', $str) if Encode::is_utf8($str);
This now produces the same result as:
Encode::_utf8_off($str);
i.e. no errors from XmlUtf8Decode() any more, but various other tests
still fail, and I still don't know if this is the right thing to be
doing. The reason that I was trying to turn the UTF-8 flag off is that
the substitution being done by encodeText() fails if the flag is on.
The following simple program demonstrates the same thing, can you
explain it? It attempts to change a pair of UTF-8 encoded characters to
the ASCII `.' character (decimal 46):-
# --- START OF SCRIPT ---
use strict;
use warnings;
use bytes;
use Encode;
sub dot { return chr(46) }
my $str = decode('utf8', (chr(194) . chr(129)) x 2);
my $str1 = $str;
printf "str1 = %vd, UTF-8 flag is %s\n",
$str1, Encode::is_utf8($str1) ? 'ON' : 'OFF';
my $res1 = $str1 =~ s/(\xC2.)/dot()/egs;
printf "str1 = %vd after $res1 substitutions\n", $str1;
my $str2 = encode('utf8', $str);
printf "str2 = %vd, UTF-8 flag is %s\n",
$str2, Encode::is_utf8($str2) ? 'ON' : 'OFF';
my $res2 = $str2 =~ s/(\xC2.)/dot()/egs;
printf "str2 = %vd after $res2 substitutions\n", $str2;
# --- END OF SCRIPT ---
This produces the following output:
# --- START OF OUTPUT ---
str1 = 194.129.194.129, UTF-8 flag is ON
str1 = 194.129.194.129 after substitutions
str2 = 194.129.194.129, UTF-8 flag is OFF
str2 = 46.46 after 2 substitutions
# --- END OF OUTPUT ---
Why does the substitution fail if the UTF-8 flag is on?
- Steve
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Steve Hay
Steve Hay
Robin Berjon
Steve Hay
Petr Pajas
Robin Berjon
Steve Hay
Petr Pajas
|