ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: utf-8 (or not) encoding question
by Martin Leese other posts by this author
Dec 9 2004 8:45PM messages near this date
view in the new Beta List Site
[ANN] XML::SAX::ExpatXS 1.04 | Re: utf-8 (or not) encoding question
> 
>  ubject:
>  utf-8 (or not) encoding question
>  From:
>  Joshua Santelli <santellij@[...].com>
>  Date:
>  Thu, 9 Dec 2004 10:21:22 -0800 (PST)
>  To:
>  perl-xml@[...].com
> 
>  To:
>  perl-xml@[...].com
> 
> 
> Hello,
> 
> I'm using XML::LibXML to parse a file that I have. 
> The character in questions looks like one byte (F3)
> when I `less` the file on UNIX:
> 
> analysis and algebraic topology, such as
> Calder<F3>n-Zygmund theory
> 
> This is the error I get when I parse_file() the file:
>   
> 
...

> Is LibXML correct in thinking that this this is not
> UTF-8?  
> 
Yes.

> Is there an easy way for me to tell if this
> (or any file) is properly encoded as UFT-8?
>   
> 
I believe you have found such a way.

> What's wrong with F3 (&amp;#243;)?
>   
> 
Nothing.  It simply isn't UTF-8 encoded.

It is the ISO-8859-1 (Latin-1) encoding for a small letter
o with acute.  This is Unicode point U+00F3.

To see how to encode this codepoint as UTF-8, visit:
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G7404
and look at Table 3-5.

I calculate that the correct UTF-8 encoding for this codepoint
would be the pair of bytes C3 B3.

Regards,
Martin

_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Martin Leese
Joshua Santelli
Dominic Mitchell

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved