XML::LibXML error handling for non-UTF-8 data
by Ibrahim Dawud other posts by this author
Jun 6 2006 3:53AM messages near this date
view in the new Beta List Site
Re: :LibXML error handling for non-UTF-8 data
|
Re: XML::LibXML error handling for non-UTF-8 data
& XSLT Dear Colleagues,
We communicate with our suppliers via XML messages over the web.
We currently use XML::LibXML (version 1.58) to parse our incoming messages:
Example XML:
<body>
<product>
<productID> 3661</productID>
<price> 100</price>
<name> Name</name>
<categoryName> Name</categoryName>
<categoryID> 28</categoryID>
.....
</product>
.....
</body>
Occasionally, a supplier will send an XML message that contains
non-UTF-8 characters in a product detail. The wrong encoding of that
particular data element causes the XML itself to be not well formed.
This results in a parser error as follows:
":1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x91 0x65 0x61 0x73 "
Then it breaks.
We tried to use ( $parser-> recover(1); ) so that the parser can skip
over the error.
Unfortunately, this is not enough since we have no way of knowing that
an error occurred and that the parser returned bad data for that
product. We cannot validate the products for wrong data (no specific
data format expected).
What we need is some sort of error handling within the parsing method
that will detect non-UTF-8 data, raise an error, and then SKIP over
the whole product in the XML block that contains that error, and
continue to parse the rest of the XML document normally.
So we need your advice to solve this problem or a work around.
Example of the perl code used to parse the message:
###################################################################
my $parser=XML::LibXML-> new(); # create new object of LibXML
# $parser-> recover(1);
my $tree=$parser-> parse_string($xml_msg); # start to parse xml file
my $root=$tree-> getDocumentElement; # get the root element <body>
my $count = 0;
my @ResultSet = ();
foreach my $product ($root-> findnodes('product')){
$ResultSet[$count][0] = $product-> findvalue('productID');
$ResultSet[$count][1] = $product-> findvalue('name');
$ResultSet[$count][3] = $product-> findvalue('categoryName');
$ResultSet[$count][4] = $product-> findvalue('categoryID');
$ResultSet[$count][5] = $product-> findvalue('price');
$count++;
}
###################################################################
Thank you and best regards.
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Ibrahim Dawud
Aaron Crane
Michael Kröll
Ciaran Hamilton
|