Re: UTF8 in 5.8.1
by Gisle Aas other posts by this author
Feb 28 2005 11:23AM messages near this date
UTF8 in 5.8.1
|
Re: UTF8 in 5.8.1
Aaron Sherman <ajs@[...].com> writes:
> Is anyone aware of any limitations in 5.8.1 that would lead to a problem
> using substr on utf8 strings? I'm getting lots of:
>
> Malformed UTF-8 character (unexpected end of string)
>
> errors in a function that's dealing only with strings that are read from
> a file that was written to a file, and is being read back using the
> :utf8 encoding layer. I had thought that substr was always safe on such
> strings, but it's starting to look like that was a vain hope....
The :utf8 layer just slaps on the UTF8 flag trusting the data it reads
to be well formed utf8.
You can use the :encoding(UTF-8) layer if you don't trust the file
content to be valid UTF8, and then set $PerlIO::encoding::fallback to
specify what to do with bad sequences.
Regards,
Gisle
Thread:
Aaron Sherman
Gisle Aas
Aaron Sherman
Nicholas Clark
Dan Kogai
Aaron Sherman
|