Re: Surprising (for me) benchmark results...
by John J. Lee other posts by this author
May 2 2001 1:14PM messages near this date
SOAP.py 0.9
|
Re: Surprising (for me) benchmark results...
On Tue, 1 May 2001, Daniel Berlin wrote:
> On Wed, 2 May 2001, Rainer Deyke wrote:
> > "Courageous" <jkraska1@[...].com> wrote in message
> > news:rrouet0f4un3h3eo8pq6nlk5idgkmhhhca@[...]..
> > > On Wed, 02 May 2001 01:11:50 GMT, "Rainer Deyke" <root@[...].com>
> > wrote:
> > > >"Daniel Berlin" <dan@[...].com> wrote in message
> > > >news:mailman.988762570.8591.python-list@[...]..
> > > >> Actually, the sort is probably not the bottleneck.
> > > >> As the file gets larger, the bottleneck becomes disk time, not sort
> > time.
> > > >Disk access is O(N); sorting is typically O(N log N). Therefore as the
[...]
> > > Disregarding the algorithm and this specific situation in question, you
> > are
> > > both correct; this is because while disk access may scale O(N), this
> > > scaling is multiplied by a huge constant factor that can in many cases
> > > dwarf the sorting algorithm, depending on the number of entries and
> > > the like.
The best answer, of course. Mind you, I don't know whether the curves
*ever* cross for feasible N. Does anyone here know, though direct
experience or calculation? The other thing is that for small changes in
N, disk access doesn't necessarily go as N (caches, that sort of thing?),
and maybe sorting doesn't go exactly as N ln N.
> > I was responding to this: "As the file gets larger, the bottleneck becomes
> > disk time, not sort time." This is just plain incorrect (assuming larger
> > file = more entries, not longer entries).
> No, it's not.
> You still have to get your damn file off the disk.
> He was timing a sorting program in it's entirety, not the sort.
> Therefore, the disk i/o time matters. And with it taking such a short time
> for each program run, it could possibly matter a *whole lot*.
> > If disk access dominates on large
> > files, it dominates even more on small files.
> Err, that doesn't make it not dominate on small files.
And if it (disk access) dominates on small files, then it dominates less
on large files, until eventually the sort dominates. Mm?
To repeat what you said again:
> > > >> As the file gets larger, the bottleneck becomes disk time, not sort
> > time.
(you can't get any more smug and annoying than a good 'Mm?', can you? :)
John
--
http://mail.python.org/mailman/listinfo/python-list
Thread:
John J. Lee
Daniel Berln
John J. Lee
Brian Quinlan
John J. Lee
Daniel Berlin
John J. Lee
|