Re: Re: XML::Parser & "invalid character"
by Duncan Cameron other posts by this author
Mar 29 2002 10:21PM messages near this date
view in the new Beta List Site
Auto Reply to your message ...
|
Re: Re: XML::Parser & "invalid character"
On 2002-03-29 Jenda Krynicky wrote:
> From: Duncan Cameron <dcameron@[...].uk>
> > On 2002-03-29 Jenda Krynicky wrote:
> > >I'm using XML files to replicate some settings and other stuff
> > >between several servers (thanks for your previous help!). Everything
> > >is cool except one thing.
> > >
> > >I use character with code 2 as a separator or marker on several
> > >places in the database.
> > >the problem is that if I write a string containing a chr(2) into an
> > >XML file, XMP::Parser (used via XML::Simple) will refuse to parse the
> > >file
>
> > XML doesn't allow such a character value, see the XML character
> > definition http://www.w3.org/TR/2000/REC-xml-20001006#charsets
>
> Aaaagrrrrrrr. Someone thought they're clever ...
>
> Who would it hurt if the parsers allowed &#anynumber; ?
>
> > >And if not what character would you recomend to be used
> > >(escaped if necessary) to make XML::Parser happy, but still being
> > >reasonably safe that it will not be mistaken for "normal" data.
> > That depends on what your application defines as 'normal data'.
> > Not sure that I fully understand what you want to do so I can't really
> > suggest anything.
>
> Basicaly all I want is to write some data from the database on one
> computer to a file and read them in on another. I did not expect to
> be restricted to "text only".
>
> Anyway thanks to suggestion by Chris Strom I'll do it this way ...
>
> 1) if the string doesn't contain any "forbidden" characters :
>
> escape what necessary and write "<TAG>$text</TAG>"
>
> 2) otherwise
>
> escape & and > as usual, escape the forbidden ones as
> &#...;, write "<TAG><![CDATA[$text]]></TAG>" to keep
> the escapes away from XML::Parser and when reading
> unescape myself.
>
> It's not nice, but it gets the job done.
>
You're right, it's not nice and you might find further problems downstream.
You appear to be building in too much 'magic'. Bear in mind that  is not a
valid XML character reference. Every time that you or someone else processes the
parsed data you have to then expand these pseudo character entities.
If your database data can contain 'binary' values then you might consider encoding
all fields as base64, or at least only those fields which may contain binary data.
base64 encoding increases the size of your data by 1/3 but otherwise is a clean way
to do it.
Regards,
Duncan Cameron
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Duncan Cameron
Jenda Krynicky
|