ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> numpy-discussion
numpy-discussion
Re: [Numpy-discussion] Histograms of extremely large data sets
by Rick White other posts by this author
Dec 14 2006 5:31AM messages near this date
Re: [Numpy-discussion] Histograms of extremely large data sets | Re: [Numpy-discussion] Histograms of extremely large data sets
On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote:

>  At some point I might try and test
>  different cache sizes for different data-set sizes and see what the
>  effect is.  For now, 65536 seems a good number and I would be happy to
>  see this replace the current numpy.histogram.

I experimented a little on my machine and found that 64k was a good  
size, but it is fairly insensitive to the size over a wide range  
(16000 to 1e6).  I'd be interested to hear how this scales on other  
machines -- I'm pretty sure that the ideal size will keep the piece  
of the array being sorted smaller than the on-chip cache.

Just so we don't get too smug about the speed, if I do this in IDL on  
the same machine it is 10 times faster (0.28 seconds instead of 4  
seconds).  I'm sure the IDL version uses the much faster approach of  
just sweeping through the array once, incrementing counts in the  
appropriate bins.  It only handles equal-sized bins, so it is not as  
general as the numpy version -- but equal-sized bins is a very common  
case.  I'd still like to see a C version of histogram (which I guess  
would need to be a ufunc) go into the core numpy.
					Rick
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@[...].org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Thread:
Cameron Walsh
Rick White
Brian Granger
Eric Jones
Cameron Walsh
Rick White
Eric Jones
Cameron Walsh
Cameron Walsh
Eric Jones
Giorgio Luciano
Sven Schreiber
Christopher Barker
Cameron Walsh
Eric Jones

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved