ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> tcl-core
tcl-core
[TCLCORE] [SPAM] [6.0] Re: base64 in the core?
by Trevor Davel other posts by this author
Apr 27 2008 5:03AM messages near this date
Re: [TCLCORE] Crossplatform Tk differences | Re: [TCLCORE] [SPAM] [6.0] Re: base64 in the core?
Hi,

I would also like to see base64 in the core, primarily because the pure 
Tcl implementation in Tcllib is slow and heavy on memory.

I don't think 'encoding' is the right place for it though - that is for 
converting between representations of character sets, not for high level 
"text codecs" (if I can call them that).

I would think the most appropriate location is binary scan / binary 
format, like the bin<-> hex conversion.  IIRC Perl's base64 support is 
exposed in pretty much this manner.

Another possibility may be a new core ensemble for "text codecs" that 
would include base64 & hex support in the core, and could be extended 
(e.g. in Tcllib) to include base32, quoted-printable, and other common 
encodings.

Aside: re my comment on the tcllib base64 being slow:

I have recently played with various encodings to store large objects 
into databases.  Converting a 2Mb file to Hex took 0.04s, and to base64 
took 4s (two orders of magnitude slower).  Converting to escapes SQL 
took 17s for a naive implementation, but was possible in 2s through 
judicious use of 'string map'.  With a 30Mb file only the Hex 
implementation could complete on a system with 1Gb RAM due to high 
memory use.

My point really is that Tcl is exceptionally poor at processing strings 
as raw bytes - the typical approach seems to be  'foreach c [split $str 
{}]' and then - if doing some sort of encoding or decoding - build a new 
string using append.  Because everything is a string I have encountered 
a lot of algorithms that exhibit poor performance because of this 
approach.  It would seem that Tcl would perform a lot better if it 
offered another idiom for string parsing and/or recoding.  I don't have 
any suggestions at the moment ;/ but I'd be interested to hear other 
opinions on the matter.

For interest I ran three tests, each loading the same 25Mb data file 
(26569888 bytes) and then processing it:
(Tests on WinXP SP2 using Tcl 8.5a6)
Mem usage after starting Tcl: 7508 /7512 (peak) / 3692 (vm)
Mem usage after loading 25Mb file: 33604 / 40452 (peak) / 36572 (vm)

foreach/split: set a {} ; time { foreach c [split $data {}] { append a 
$c } ; puts [string length $a] }
Time 55.67s
Mem usage after operation: 164368 / 270080 (peak) / 199952 (vm)

for/string index: set a {} ; set dlen [string length $data] ; time { for 
{ set i 0 } { $i < $dlen } { incr i } { append a [string index $data $i] 
} ; puts [string length $a] }
Time 88.25s
Mem usage after operation: 124996 / 124996 (peak) / 160716 (vm)

binary scan to hex: time { binary scan $data H* a ; puts [string length 
$a] }
Mem usage after operation: 85684 / 85684 (peak) / 88572 (vm)
(*) Note that the first two tests are identity transforms while this 
produces a string twice the length of the data, and still has the lowest 
memory use.

Perhaps I do have a suggestion: a "string foreach c $str $startofs 
$endofs { ... }" would be the equivalent of the for/string-index 
approach but not suffer from having to do a 'string index' on each 
iteration.  And a way to provide a hint about the expected size of a 
non-shared string may also help?

Regards,
Twylite
>  Date: Sat, 26 Apr 2008 10:25:05 +0200 (CEST)
>  From: <suchenwi@[...].de>
>  Subject: [TCLCORE] base64 in the core?
>  To: <donal.k.fellows@[...].uk>, <lluisgomez@[...].org>
>  Cc: tcl-core@[...].net
>  Message-ID: <200804260825.m3Q8P4wI008306@[...].de>
>  Content-Type: text/plain; charset="us-ascii"
> 
>  I expect it isn't much code, basically repackaging 8-bit bytes in 6-bit chars. Biggest q i
s probably the API - one cutesy idea is 
>  encoding convert{from,to} base64
>  :^)
> 
> 
>  Best regards, Richard Suchenwirth-Bauersachs  http://wiki.tcl.tk/RS
> 
>    

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Tcl-Core mailing list
Tcl-Core@[...].net
https://lists.sourceforge.net/lists/listinfo/tcl-core
Thread:
Trevor Davel
Pat Thoyts
Kevin Kenny
Donal K. Fellows
Larry McVoy
Donal K. Fellows
Larry McVoy
Donal K. Fellows
Kevin Kenny
Donal K. Fellows
Larry McVoy
Donal K. Fellows
Alexandre Ferrieux
Lars Hellstrom
Donal K. Fellows
Alexandre Ferrieux
Donal K. Fellows
Alexandre Ferrieux
Donal K. Fellows
Alexandre Ferrieux
Larry McVoy

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved