ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> python-list
python-list
Re: Surprising (for me) benchmark results...
by Courageous other posts by this author
May 2 2001 9:45AM messages near this date
Re: Surprising (for me) benchmark results... | Re: what's in a name (was Re: using lambda to print everything ina list)
On Wed, 02 May 2001 01:11:50 GMT, "Rainer Deyke" <root@[...].com>  wrote:

> "Daniel Berlin" <dan@[...].com> wrote in message
> news:mailman.988762570.8591.python-list@[...]..
> > Actually, the sort is probably not the bottleneck.
> > As the file gets larger, the bottleneck becomes disk time, not sort time.
> 
> Disk access is O(N); sorting is typically O(N log N).  Therefore as the
> number of entries in the file size increases, the time taken by the sort
> becomes more significant and the time taken by disk access becomes less
> significant.  This is assuming that the file fits into memory (as was the
> case here).

Disregarding the algorithm and this specific situation in question, you are
both correct; this is because while disk access may scale O(N), this
scaling is multiplied by a huge constant factor that can in many cases
dwarf the sorting algorithm, depending on the number of entries and
the like. Things like this are exactly what I was alluding to when I referenced
"primary and secondary complexity factors" in my other message.

At the moment, for example, I'm working on a modification to Python's
internal hash tables that uses a polymorphic dictionary, where for very
small dictionaries, a simple array is used. Using array logic for lookups
on very small arrays allows me to manually unroll the loops; hidden
complexity costs are elminated in this fashion, and maximum parallelism
is gained. 

Preliminary statistical analysis of the python interpreter show that the vast
majority of all Python dictionaries during the life of the interpreter are of
n < 8, with the majority of those being n < 4. At the moment I've sped up
these common cases by 50%; I'm toying around with MMX/SSE assembly
to see if I can even make them faster.

[N.B.: this is a preliminary finding, and hasn't been well-integrated into
Python yet, so there are still mitigating factors which might undo my ability
to continue with this optimization route].

More later,


Joe Kraska
San Diego
CA

-- 
http://mail.python.org/mailman/listinfo/python-list

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved