ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
XML::LibXML error handling for non-UTF-8 data
by Ibrahim Dawud other posts by this author
Jun 6 2006 3:53AM messages near this date
view in the new Beta List Site
Re: :LibXML error handling for non-UTF-8 data | Re: XML::LibXML error handling for non-UTF-8 data
& XSLT Dear Colleagues,

We communicate with our suppliers via XML messages over the web.
We currently use XML::LibXML (version 1.58) to parse our incoming messages:

Example XML:
<body> 
<product> 
<productID> 3661</productID>
<price> 100</price>
<name> Name</name>
<categoryName> Name</categoryName>
<categoryID> 28</categoryID>
.....
</product> 
.....
</body> 

Occasionally, a supplier will send an XML message that contains
non-UTF-8 characters in a product detail. The wrong encoding of that
particular data element causes the XML itself to be not well formed.

This results in a parser error as follows:

":1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x91 0x65 0x61 0x73 "

Then it breaks.

We tried to use ( $parser-> recover(1); ) so that the parser can skip
over the error.
Unfortunately, this is not enough since we have no way of knowing that
an error occurred and that the parser returned bad data for that
product. We cannot validate the products for wrong data (no specific
data format expected).

What we need is some sort of error handling within the parsing method
that will detect non-UTF-8 data, raise an error, and then SKIP over
the whole product in the XML block that contains that error, and
continue to parse the rest of the XML document normally.

So we need your advice to solve this problem or a work around.

Example of the perl code used to parse the message:
###################################################################
    my $parser=XML::LibXML-> new();                # create new object of LibXML
    # $parser-> recover(1);
    my $tree=$parser-> parse_string($xml_msg);   # start to parse xml file
    my $root=$tree-> getDocumentElement;        # get the root element <body>

    my $count = 0;
    my @ResultSet = ();

    foreach my $product ($root-> findnodes('product')){
        $ResultSet[$count][0] = $product-> findvalue('productID');
$ResultSet[$count][1] = $product-> findvalue('name');
$ResultSet[$count][3] = $product-> findvalue('categoryName');
$ResultSet[$count][4] = $product-> findvalue('categoryID');
$ResultSet[$count][5] = $product-> findvalue('price');
$count++;
    }
###################################################################

Thank you and best regards.
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Ibrahim Dawud
Aaron Crane
Michael Kröll
Ciaran Hamilton

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved