Re: Handling £ (pound sterling) symbols in content
by Grant McLean other posts by this author
Jun 27 2009 2:40PM messages near this date
view in the new Beta List Site
Re: Handling £ (pound sterling) symbols in content
|
Re: Handling ? (pound sterling) symbols in content
& XSLT Hi Neil
You need to determine what encoding has been used by the database export
process.
If you're working with a Windows system then the most likely guess is
that the data is encoded with CP1252 or 'Win-Latin-1'. In which case
the first line of the XML file should specify an encoding like this:
<?xml version="1.0" encoding="WINDOWS-1252" ?>
If the data was encoded with UTF-8 then the XML parser module would have
recognised it automatically, so you can eliminate that option.
You can also safely assume that the data is not in ISO-8859-1 (Latin-1),
because that encoding pre-dates the definition of the Euro symbol.
Encodings in XML (and Perl) are a largish subject with many subtleties,
you can read more here:
http://perl-xml.sourceforge.net/faq/#encodings
Cheers
Grant
On Sat, 2009-06-27 at 22:24 +0100, Neil Hughes wrote:
> I've hit a problem in XML::Twig trying to handle data exported from a
> legacy database, but I suspect this is an issue I need to get some
> advice on regardless of the parser...
>
> The data contains '£' symbols which I'm struggling to format in XML for
> processing later on. The following code might help explain:
>
> ------------ BEGIN --------------
>
> use strict;
> use warnings;
>
> use XML::Twig;
>
> my $t= XML::Twig->new();
>
> # this is OK
> #my $input = '<?xml
> version="1.0"?><root><item>one</item><item>two</item><item>three</item></root>';
>
>
> # this is invalid
> #my $input = '<?xml version="1.0"?><root><item>one
> £</item><item>two</item><item>three</item></root>';
>
> # this is OK
> #my $input = '<?xml
> version="1.0"?><root><item><![CDATA[one]]></item><item><![CDATA[two]]></item><item><![CDAT
A[three]]> </item></root>';
>
>
> # this is invalid
> my $input = '<?xml version="1.0"?><root><item><![CDATA[one
> £]]></item><item><![CDATA[two]]></item><item><![CDATA[three]]></item></root>';
>
>
> $t->parse($input);
> $t->print;
>
> ------------ END --------------
>
> Whether I wrap my text data in CDATA or not, as soon as I include a
> pound sterling symbol I get the following error:
>
> not well-formed (invalid token) at line 1, column 46, byte 46 at
> /usr/local/ActivePerl-5.8/lib/XML/Parser.pm line 187
> at /Users/nkh/Documents/Dev/Perl/xml_twig/pound_test1.pl line 14
>
> Byte 46 seems to align with the '£', so I'm wondering what I need to do
> to get this character not to break the parser.
>
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Neil Hughes
Mirod
Neil Hughes
Grant McLean
Dave Howorth
Neil Hughes
Dave Howorth
|