[TCLCORE] [SPAM] [6.0] Re: base64 in the core?
by Trevor Davel other posts by this author
Apr 27 2008 5:03AM messages near this date
Re: [TCLCORE] Crossplatform Tk differences
|
Re: [TCLCORE] [SPAM] [6.0] Re: base64 in the core?
Hi,
I would also like to see base64 in the core, primarily because the pure
Tcl implementation in Tcllib is slow and heavy on memory.
I don't think 'encoding' is the right place for it though - that is for
converting between representations of character sets, not for high level
"text codecs" (if I can call them that).
I would think the most appropriate location is binary scan / binary
format, like the bin<-> hex conversion. IIRC Perl's base64 support is
exposed in pretty much this manner.
Another possibility may be a new core ensemble for "text codecs" that
would include base64 & hex support in the core, and could be extended
(e.g. in Tcllib) to include base32, quoted-printable, and other common
encodings.
Aside: re my comment on the tcllib base64 being slow:
I have recently played with various encodings to store large objects
into databases. Converting a 2Mb file to Hex took 0.04s, and to base64
took 4s (two orders of magnitude slower). Converting to escapes SQL
took 17s for a naive implementation, but was possible in 2s through
judicious use of 'string map'. With a 30Mb file only the Hex
implementation could complete on a system with 1Gb RAM due to high
memory use.
My point really is that Tcl is exceptionally poor at processing strings
as raw bytes - the typical approach seems to be 'foreach c [split $str
{}]' and then - if doing some sort of encoding or decoding - build a new
string using append. Because everything is a string I have encountered
a lot of algorithms that exhibit poor performance because of this
approach. It would seem that Tcl would perform a lot better if it
offered another idiom for string parsing and/or recoding. I don't have
any suggestions at the moment ;/ but I'd be interested to hear other
opinions on the matter.
For interest I ran three tests, each loading the same 25Mb data file
(26569888 bytes) and then processing it:
(Tests on WinXP SP2 using Tcl 8.5a6)
Mem usage after starting Tcl: 7508 /7512 (peak) / 3692 (vm)
Mem usage after loading 25Mb file: 33604 / 40452 (peak) / 36572 (vm)
foreach/split: set a {} ; time { foreach c [split $data {}] { append a
$c } ; puts [string length $a] }
Time 55.67s
Mem usage after operation: 164368 / 270080 (peak) / 199952 (vm)
for/string index: set a {} ; set dlen [string length $data] ; time { for
{ set i 0 } { $i < $dlen } { incr i } { append a [string index $data $i]
} ; puts [string length $a] }
Time 88.25s
Mem usage after operation: 124996 / 124996 (peak) / 160716 (vm)
binary scan to hex: time { binary scan $data H* a ; puts [string length
$a] }
Mem usage after operation: 85684 / 85684 (peak) / 88572 (vm)
(*) Note that the first two tests are identity transforms while this
produces a string twice the length of the data, and still has the lowest
memory use.
Perhaps I do have a suggestion: a "string foreach c $str $startofs
$endofs { ... }" would be the equivalent of the for/string-index
approach but not suffer from having to do a 'string index' on each
iteration. And a way to provide a hint about the expected size of a
non-shared string may also help?
Regards,
Twylite
> Date: Sat, 26 Apr 2008 10:25:05 +0200 (CEST)
> From: <suchenwi@[...].de>
> Subject: [TCLCORE] base64 in the core?
> To: <donal.k.fellows@[...].uk>, <lluisgomez@[...].org>
> Cc: tcl-core@[...].net
> Message-ID: <200804260825.m3Q8P4wI008306@[...].de>
> Content-Type: text/plain; charset="us-ascii"
>
> I expect it isn't much code, basically repackaging 8-bit bytes in 6-bit chars. Biggest q i
s probably the API - one cutesy idea is
> encoding convert{from,to} base64
> :^)
>
>
> Best regards, Richard Suchenwirth-Bauersachs http://wiki.tcl.tk/RS
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Tcl-Core mailing list
Tcl-Core@[...].net
https://lists.sourceforge.net/lists/listinfo/tcl-core
Thread:
Trevor Davel
Pat Thoyts
Kevin Kenny
Donal K. Fellows
Larry McVoy
Donal K. Fellows
Larry McVoy
Donal K. Fellows
Kevin Kenny
Donal K. Fellows
Larry McVoy
Donal K. Fellows
Alexandre Ferrieux
Lars Hellstrom
Donal K. Fellows
Alexandre Ferrieux
Donal K. Fellows
Alexandre Ferrieux
Donal K. Fellows
Alexandre Ferrieux
Larry McVoy
|