ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: Handling ? (pound sterling) symbols in content
by Neil Hughes other posts by this author
Jun 28 2009 2:22AM messages near this date
view in the new Beta List Site
Re: Handling ? (pound sterling) symbols in content | Re: Handling ? (pound sterling) symbols in content
& XSLT Thanks Grant....that fixed it.

The database originally started with a DOS application and still encodes 
'£' symbols weirdly, so my other (non-Perl) Windows application was 
extracting the data into XML and adding the correct '£' symbol, but I 
was not specifying the encoding as you pointed out.

If I specify "WINDOWS-1252" or "ISO-8859-1" my XML::Twig code can parse 
it without a problem. I'll probably use the former because I suspect 
there are Euro symbols in the data somewhere.

Much appreciated
-- 
Neil Hughes


On 27/6/09 22:39, Grant McLean wrote:
>  Hi Neil
>  
>  You need to determine what encoding has been used by the database export
>  process.
>  
>  If you're working with a Windows system then the most likely guess is
>  that the data is encoded with CP1252 or 'Win-Latin-1'.  In which case
>  the first line of the XML file should specify an encoding like this:
>  
>    <?xml version="1.0" encoding="WINDOWS-1252" ?>
>  
>  If the data was encoded with UTF-8 then the XML parser module would have
>  recognised it automatically, so you can eliminate that option.
>  
>  You can also safely assume that the data is not in ISO-8859-1 (Latin-1),
>  because that encoding pre-dates the definition of the Euro symbol.
>  
>  Encodings in XML (and Perl) are a largish subject with many subtleties,
>  you can read more here:
>  
>    http://perl-xml.sourceforge.net/faq/#encodings
>  
>  Cheers
>  Grant
>  
>  On Sat, 2009-06-27 at 22:24 +0100, Neil Hughes wrote:
> > I've hit a problem in XML::Twig trying to handle data exported from a 
> > legacy database, but I suspect this is an issue I need to get some 
> > advice on regardless of the parser...
> >
> > The data contains '£' symbols which I'm struggling to format in XML for 
> > processing later on. The following code might help explain:
> >
> > ------------ BEGIN --------------
> >
> > use strict;
> > use warnings;
> >
> > use XML::Twig;
> >
> > my $t= XML::Twig->new();
> >
> > # this is OK
> > #my $input = '<?xml 
> > version="1.0"?><root><item>one</item><item>two</item><item>three</item></root>'; 
> >
> >
> > # this is invalid
> > #my $input = '<?xml version="1.0"?><root><item>one 
> > £</item><item>two</item><item>three</item></root>';
> >
> > # this is OK
> > #my $input = '<?xml 
> > version="1.0"?><root><item><![CDATA[one]]></item><item><![CDATA[two]]></item><item><![CDA
TA[three]]> </item></root>'; 
> >
> >
> > # this is invalid
> > my $input = '<?xml version="1.0"?><root><item><![CDATA[one 
> > £]]></item><item><![CDATA[two]]></item><item><![CDATA[three]]></item></root>'; 
> >
> >
> > $t->parse($input);
> > $t->print;
> >
> > ------------ END --------------
> >
> > Whether I wrap my text data in CDATA or not, as soon as I include a 
> > pound sterling symbol I get the following error:
> >
> > not well-formed (invalid token) at line 1, column 46, byte 46 at 
> > /usr/local/ActivePerl-5.8/lib/XML/Parser.pm line 187
> >   at /Users/nkh/Documents/Dev/Perl/xml_twig/pound_test1.pl line 14
> >
> > Byte 46 seems to align with the '£', so I'm wondering what I need to do 
> > to get this character not to break the parser.
> >
>  
>  _______________________________________________
>  Perl-XML mailing list
>  Perl-XML@[...].com
>  To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Neil Hughes
Mirod
Neil Hughes
Grant McLean
Dave Howorth
Neil Hughes
Dave Howorth

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved