|
|
 |
|
Title: peeking into an iterator
Submitter: Unknown or non-registered user
Last Updated: 2004/09/16
Version no: 1.2
Category:
|
|
2 vote(s)
|
|
|
|
Description:
Provides two different ways to look at the items in an iterator without removing them.
The peek function returns two values: the items retrieved and the reconstructed iterator.
The peekable class wraps an iterable as an iterator that provides a peek() method in addition to the usual next().
Source: Text Source
import itertools
import collections
def peek(iterable, n=None):
"""Returns the first item(s) of the iterable and an iterable equivalent
to the original (i.e. without the first item(s) removed).
Example usage:
>>> peek(range(5))
(0, <itertools.chain object at ...>)
>>> first, iterable = peek(range(5))
>>> first
0
>>> list(iterable)
[0, 1, 2, 3, 4]
>>> first, iterable = peek(range(5), 3)
>>> first
[0, 1, 2]
>>> list(iterable)
[0, 1, 2, 3, 4]
>>> first, iterable = peek(range(5), 1)
>>> first
[0]
>>> list(iterable)
[0, 1, 2, 3, 4]
"""
iterable = iter(iterable)
if n is None:
peek_result = iterable.next()
iter_result = [peek_result]
else:
peek_result = iter_result = [iterable.next() for i in range(n)]
return peek_result, itertools.chain(iter_result, iterable)
class peekable:
"""An iterator that supports a peek operation.
Example Usage:
>>> p = peekable(range(4))
>>> p.peek()
0
>>> p.next(1)
[0]
>>> p.peek(3)
[1, 2, 3]
>>> p.next(2)
[1, 2]
>>> p.peek(2)
Traceback (most recent call last):
...
StopIteration
>>> p.peek(1)
[3]
>>> p.next(2)
Traceback (most recent call last):
...
StopIteration
>>> p.next()
3
"""
def __init__(self, iterable):
self._iterable = iter(iterable)
self._cache = collections.deque()
def __iter__(self):
return self
def _fillcache(self, n):
while len(self._cache) < n:
self._cache.append(self._iterable.next())
def next(self, n=None):
self._fillcache(n is None and 1 or n)
if n is None:
result = self._cache.popleft()
else:
result = [self._cache.popleft() for i in range(n)]
return result
def peek(self, n=None):
self._fillcache(n is None and 1 or n)
if n is None:
result = self._cache[0]
else:
result = [self._cache[i] for i in range(n)]
return result
Discussion:
In my recipe for starzip (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302325), I needed to be able to peek at the first element of an iterator without actually removing the element from the iterator. The solution I used for that class is the peek function above. For starzip, the peek function was a reasonable solution -- extract the desired items and reconstruct the iterator using itertools.chain.
Recently, I was working on a simple parsing problem and again I wanted to be able to peek at the elements of an iterator. But this time, I needed to do multiple peeks, at many different points in the iterator. The peek function would have been unwieldy here -- not only would i have had to constantly reassign to my iterator variable, but I would have wrapped the iterator repeatedly in itertools.chain calls, which could mean substantial overhead because of the repeated function calls.
The solution provided by the peekable class avoids the repeated function calls by keeping a cache of items read from the iterator (implemented here as a deque). A side benefit of this approach is that if too many items are requested from the iterator, raising the StopIteration does not throw away the last values of the iterator. For example, while a StopIteration will be raised if next(5) is called when only 3 items are left in the iterator, a call later to next(3) will still be able to return these 3 items.
(The peekable solution is something akin to Danny Yoo's ExtendedIter in http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/197530, but allows for a larger lookahead and substantially simplifies the code by using a deque.)
[14 Sept 2004 -- Removed some comments to condense code.]
[16 Sept 2004 -- Changed default argument as per Peter Otten's suggestion.]
|
|
Add comment
|
|
Number of comments: 5
Minor modification, Peter Otten, 2004/09/15
I suggest that you alter the behaviour of the 'n' argument to default to None. This allows for two ways to peek a single item: peek(it) -> elem and peek(it, 1) -> [elem].
I expect that this is less likely to fail if the n parameter is calculated at runtime.
Sample implementation:
>>> import itertools as it
>>> def peek(iterable, n=None):
... a, b = it.tee(iterable)
... if n is None:
... return a.next(), b
... else:
... return list(it.islice(a, n)), b
...
>>> peek(range(5))
(0, )
>>> list(_[1])
[0, 1, 2, 3, 4]
>>> peek(range(5), 1)
([0], )
>>> list(_[1])
[0, 1, 2, 3, 4]
>>> peek(range(5), 3)
([0, 1, 2], )
>>> list(_[1])
[0, 1, 2, 3, 4]
Add comment
2004/09/16
The idea with None is a good one. I'll update the recipe soon to do it this way.
About itertools.tee though... Looking at the documentation, I couldn't tell what happens when you tee off an iterator and then only use one element of it. The code at http://docs.python.org/lib/itertools-example.html seems to suggest that all the items will be stored until all of the iterators are used up. (Meaning that in the case here, all items but the first will be stored since b is exhausted while a uses only one element.) Do you know what the current implementation does?
Add comment
Peter Otten, 2004/09/23
It seems the "gap" items are only stored until one iterator is garbage-collected. At least the following script had no problems with memory consumption when called with or without the '-p' option:
import sys, itertools
def peek(iterable, n=None):
a, b = itertools.tee(iterable)
if n is None:
return a.next(), b
else:
return list(itertools.islice(a, n)), b
def gen(r=range(10)):
while 1:
for i in r:
yield i
a = gen()
if "-p" in sys.argv:
data, a = peek(a)
for i in xrange(10**6):
a.next()
raw_input("ready")
Add comment
Worried about itertools.chain, Shannon -jj Behrens, 2007/01/16
I'm worried about the following line in peek: return peek_result, itertools.chain(iter_result, iterable) It turns out that this adds another level of indirection. If you're peeking twice per element at a list of elements that won't fit in memory, you end up with too many levels of indirection and a segfault. This happened to me. I used gdb to look at the core, and I literally had tens of thousands of frames on the stack. I only need to peek one element in, so I suggest the following:__docformat__ = "restructuredtext"
class peekable:
"""Make an iterator peekable.
This is implemented with an eye toward simplicity. On the downside,
you can't do things like peek more than one item ahead in the
iterator. On the bright side, it doesn't require anything from
itertools, etc., so it's less likely to encounter strange bugs,
which occassionally do happen.
Example usage::
>>> numbers = peekable(range(6))
>>> numbers.next()
0
>>> numbers.next()
1
>>> numbers.peek()
2
>>> numbers.next()
2
>>> numbers.next()
3
>>> for i in numbers:
... print i
...
4
5
"""
_None = () # Perhaps None is a valid value.
def __init__(self, iterable):
self._iterable = iter(iterable)
self._buf = self._None
def __iter__(self):
return self
def _is_empty(self):
return self._buf is self._None
def peek(self):
"""Peek at the next element.
This may raise StopIteration.
"""
if self._is_empty():
self._buf = self._iterable.next()
return self._buf
def next(self):
if self._is_empty():
return self._iterable.next()
ret = self._buf
self._buf = self._None
return ret
Add comment
Huh?, Steven Bethard, 2007/03/19
Did you look at the peekable class in the recipe? It doesn't use itertools.chain. You shouldn't have any risk of getting too many frames on the stack. (That's pretty much the whole point of using peekable() instead of peek() -- the former is much more efficient for repeated calls.)
Add comment
|
|
|
|
|
 |
|