ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> soapbuilders
soapbuilders
FW: UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)
by other posts by this author
Apr 2 2001 10:39PM messages near this date
Re: Extra parameters | RE: UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)
We found that we had a problem with this in the Apache implementation when
we used a Reader, rather than a raw InputStream, to create the InputSource
for the XML parser.  The Reader apparently got confused by the BOM, and ate
some of it but not all, so the parser couldn't deal.  When we switched to
sending the InputStream directly to the parser (in our case, Xerces), all
was well.  Just an FYI, in case this might have something to do with the
problem you're seeing.

--Glen

-----Original Message-----
From: Michael Brennan [mailto:michael_brennan@[...]..]
Sent: Monday, April 02, 2001 6:07 PM
To: 'soapbuilders@yahoogroups.com'
Subject: UTF-8 BOM (was RE: [soapbuilders] Follow-up UTF-8 test)


Thanks for the reference. I've overlooked that (and thought that UTF-8 never
includes a BOM).

Interestingly, the XML parser Sun ships with JAXP chokes on this. Now there
is one more thing to test for conformance: XML parsers.

Looks like this one is a bug in Sun's parser.  :-(

I wonder how many other XML parsers have problems with this.

-----Original Message-----
From: Fredrik Lundh [mailto:fredrik@[...]..]
Sent: Saturday, March 31, 2001 1:04 AM
To: soapbuilders@[...].com
Subject: Re: [soapbuilders] Follow-up UTF-8 test


michael wrote:
>  However, I am still seeing one odd problem: the returned message seemed to
>  have some garbage bytes preceding the XML prolog. It appears to be 3 bytes
>  whose hex values are: EFBBBF.
> 
>  I've seen this same sequence of bytes when I save a file in UTF-8 format
>  using Notepad.
> 
>  Any idea what's happening here?

it's a unicode BOM (byte order mark).  it's not necessary for
UTF-8, but your parser shouldn't choke on it.

more info here:

    http://www.unicode.org/unicode/faq/utf_bom.html
<http://www.unicode.org/unicode/faq/utf_bom.html> 

also see appendix F of the XML spec.

Cheers /F


To unsubscribe from this group, send an email to:
soapbuilders-unsubscribe@[...].com



Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
<http://docs.yahoo.com/info/terms/>  .
Attachments:
unknown1


Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved