Re: [Python-Dev] When do sets shrink?
by Noam Raphael other posts by this author
Dec 31 2005 3:08PM messages near this date
Re: [Python-Dev] When do sets shrink?
|
Re: [Python-Dev] When do sets shrink?
On 12/31/05, Raymond Hettinger <raymond.hettinger@[...].net> wrote:
> [Noam]
> > For example, iteration over a set which once had
> > 1,000,000 members and now has 2 can take 1,000,000 operations every
> > time you traverse all the (2) elements.
>
> Do you find that to be a common or plausible use case?
I don't have a concrete example in this minute, but a set which is
repeatedly filled with elements and then emptied by pop operations
doesn't seem to me that far-fetched.
>
> Was Guido's suggestion of s=set(s) unworkable for some reason? dicts
> and sets emphasize fast lookups over fast iteration -- apps requiring
> many iterations over a collection may be better off converting to a list
> (which has no dummy entries or empty gaps between entries).
It's workable, but I think that most Python programmers haven't read
that specific message, and are expecting operations which should take
a short time to take a short time. Converting to a list won't help the
use-case above, and anyway, it's another thing that I wouldn't expect
anyone to do - there's no reason that iteration over a set should take
a long time.
(I'm speaking of my point of view, which I believe is common. I don't
expect programs I write in Python to be super-fast - if that were the
case, I would write them in C. I do expect them to take a reasonable
amount of time, and in the case of iteration over a set, that means a
time proportional to the number of elements in the set.)
>
> Would the case be improved by incurring the time cost of 999,998 tests
> for possible resizing (one for each pop) and some non-trivial number of
> resize operations along the way (each requiring a full-iteration over
> the then current size)?
>
I believe it would. It seems to me that those 999,998 tests take not
much more than a machine clock, which means about 1 milisecond on
todays computers. Those resize operations will take some more
miliseconds. It all doesn't really matter, since probably all other
things will take much more. I now run this code
> >> s = set()
> >> for j in xrange(1000000):
... s.add(j)
...
> >> while s:
... tmp = s.pop()
...
And it took 2.4 seconds to finish. And it's okay - I'm just saying
that a few additional clock ticks per operation will usually not
matter when the overall complexity is the same, but changes in order
of complexity can matter much more.
> Even if this unique case could be improved, what is the impact on common
> cases? Would a downsizing scheme risk thrashing with the
> over-allocation scheme in cases with mixed adds and pops?
>
I think that there shouldn't be additional damage beyond those clock
ticks. The simple method I took from "Introduction to Algorithms"
works no matter what sequence of adds and pops you have.
> Is there any new information/research beyond what has been obvious from
> the moment the dict resizing scheme was born?
I wanted to say that there isn't any new information, and yet I don't
think that I have to assume that everything in current Python is the
best that can be. All I did was finding another reason why a
downsizing scheme might be good, and posting it to ask if people have
thought about it. If you have a document listing all the design
decisions that went into dict implementation, then please send it to
me and I won't ask about things that were already thought about.
But the answer is, yes. I beleive that the current dict resizing
scheme was born before the iterator protocol was introduced, and it
may be a reason why the current scheme doesn't try to minimize the
number of empty hash table entries.
Noam
_______________________________________________
Python-Dev mailing list
Python-Dev@[...].org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-dev-ml%40activestate.c
om
Thread:
Noam Raphael
Noam Raphael
Fernando Perez
Raymond Hettinger
Raymond Hettinger
Noam Raphael
Tim Peters
Josiah Carlson
Raymond Hettinger
"Martin v. Löwis"
Fredrik Lundh
Noam Raphael
Josiah Carlson
Donovan Baarda
Fredrik Lundh
Noam Raphael
"Martin v. Löwis"
Adal Chiriliuc
1
Adal Chiriliuc
Raymond Hettinger
Donovan Baarda
Noam Raphael
Guido van Rossum
Noam Raphael
Josiah Carlson
"Martin v. Löwis"
Noam Raphael
|