Re: utf-8 (or not) encoding question
by Martin Leese other posts by this author
Dec 9 2004 8:45PM messages near this date
view in the new Beta List Site
[ANN] XML::SAX::ExpatXS 1.04
|
Re: utf-8 (or not) encoding question
>
> ubject:
> utf-8 (or not) encoding question
> From:
> Joshua Santelli <santellij@[...].com>
> Date:
> Thu, 9 Dec 2004 10:21:22 -0800 (PST)
> To:
> perl-xml@[...].com
>
> To:
> perl-xml@[...].com
>
>
> Hello,
>
> I'm using XML::LibXML to parse a file that I have.
> The character in questions looks like one byte (F3)
> when I `less` the file on UNIX:
>
> analysis and algebraic topology, such as
> Calder<F3>n-Zygmund theory
>
> This is the error I get when I parse_file() the file:
>
>
...
> Is LibXML correct in thinking that this this is not
> UTF-8?
>
Yes.
> Is there an easy way for me to tell if this
> (or any file) is properly encoded as UFT-8?
>
>
I believe you have found such a way.
> What's wrong with F3 (&#243;)?
>
>
Nothing. It simply isn't UTF-8 encoded.
It is the ISO-8859-1 (Latin-1) encoding for a small letter
o with acute. This is Unicode point U+00F3.
To see how to encode this codepoint as UTF-8, visit:
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G7404
and look at Table 3-5.
I calculate that the correct UTF-8 encoding for this codepoint
would be the pair of bytes C3 B3.
Regards,
Martin
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Martin Leese
Joshua Santelli
Dominic Mitchell
|