ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: Handling £ (pound sterling) symbols in content
by Grant McLean other posts by this author
Jun 27 2009 2:40PM messages near this date
view in the new Beta List Site
Re: Handling £ (pound sterling) symbols in content | Re: Handling ? (pound sterling) symbols in content
& XSLT Hi Neil

You need to determine what encoding has been used by the database export
process.

If you're working with a Windows system then the most likely guess is
that the data is encoded with CP1252 or 'Win-Latin-1'.  In which case
the first line of the XML file should specify an encoding like this:

  <?xml version="1.0" encoding="WINDOWS-1252" ?> 

If the data was encoded with UTF-8 then the XML parser module would have
recognised it automatically, so you can eliminate that option.

You can also safely assume that the data is not in ISO-8859-1 (Latin-1),
because that encoding pre-dates the definition of the Euro symbol.

Encodings in XML (and Perl) are a largish subject with many subtleties,
you can read more here:

  http://perl-xml.sourceforge.net/faq/#encodings

Cheers
Grant

On Sat, 2009-06-27 at 22:24 +0100, Neil Hughes wrote:
>  I've hit a problem in XML::Twig trying to handle data exported from a 
>  legacy database, but I suspect this is an issue I need to get some 
>  advice on regardless of the parser...
>  
>  The data contains '£' symbols which I'm struggling to format in XML for 
>  processing later on. The following code might help explain:
>  
>  ------------ BEGIN --------------
>  
>  use strict;
>  use warnings;
>  
>  use XML::Twig;
>  
>  my $t= XML::Twig->new();
>  
>  # this is OK
>  #my $input = '<?xml 
>  version="1.0"?><root><item>one</item><item>two</item><item>three</item></root>'; 
>  
>  
>  # this is invalid
>  #my $input = '<?xml version="1.0"?><root><item>one 
>  £</item><item>two</item><item>three</item></root>';
>  
>  # this is OK
>  #my $input = '<?xml 
>  version="1.0"?><root><item><![CDATA[one]]></item><item><![CDATA[two]]></item><item><![CDAT
A[three]]> </item></root>'; 
>  
>  
>  # this is invalid
>  my $input = '<?xml version="1.0"?><root><item><![CDATA[one 
>  £]]></item><item><![CDATA[two]]></item><item><![CDATA[three]]></item></root>'; 
>  
>  
>  $t->parse($input);
>  $t->print;
>  
>  ------------ END --------------
>  
>  Whether I wrap my text data in CDATA or not, as soon as I include a 
>  pound sterling symbol I get the following error:
>  
>  not well-formed (invalid token) at line 1, column 46, byte 46 at 
>  /usr/local/ActivePerl-5.8/lib/XML/Parser.pm line 187
>    at /Users/nkh/Documents/Dev/Perl/xml_twig/pound_test1.pl line 14
>  
>  Byte 46 seems to align with the '£', so I'm wondering what I need to do 
>  to get this character not to break the parser.
>  

_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Neil Hughes
Mirod
Neil Hughes
Grant McLean
Dave Howorth
Neil Hughes
Dave Howorth

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved