ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: peeking into an iterator
Submitter:  Unknown or non-registered user
Last Updated: 2004/09/16
Version no: 1.2
Category:

 

5 stars 2 vote(s)


Description:

Provides two different ways to look at the items in an iterator without removing them.

The peek function returns two values: the items retrieved and the reconstructed iterator.
The peekable class wraps an iterable as an iterator that provides a peek() method in addition to the usual next().

Source: Text Source

import itertools
import collections

def peek(iterable, n=None):
    """Returns the first item(s) of the iterable and an iterable equivalent
    to the original (i.e. without the first item(s) removed).

    Example usage:
    >>> peek(range(5))
    (0, <itertools.chain object at ...>)
    >>> first, iterable = peek(range(5))
    >>> first
    0
    >>> list(iterable)
    [0, 1, 2, 3, 4]
    >>> first, iterable = peek(range(5), 3)
    >>> first
    [0, 1, 2]
    >>> list(iterable)
    [0, 1, 2, 3, 4]
    >>> first, iterable = peek(range(5), 1)
    >>> first
    [0]
    >>> list(iterable)
    [0, 1, 2, 3, 4]
    """
    iterable = iter(iterable)
    if n is None:
        peek_result = iterable.next()
        iter_result = [peek_result]
    else:
        peek_result = iter_result = [iterable.next() for i in range(n)]
    return peek_result, itertools.chain(iter_result, iterable)


class peekable:
    """An iterator that supports a peek operation.

    Example Usage:
    >>> p = peekable(range(4))
    >>> p.peek()
    0
    >>> p.next(1)
    [0]
    >>> p.peek(3)
    [1, 2, 3]
    >>> p.next(2)
    [1, 2]
    >>> p.peek(2)
    Traceback (most recent call last):
      ...
    StopIteration
    >>> p.peek(1)
    [3]
    >>> p.next(2)
    Traceback (most recent call last):
      ...
    StopIteration
    >>> p.next()
    3
    """
    def __init__(self, iterable):
        self._iterable = iter(iterable)
        self._cache = collections.deque()
        
    def __iter__(self):
        return self

    def _fillcache(self, n):
        while len(self._cache) < n:
            self._cache.append(self._iterable.next())
    
    def next(self, n=None):
        self._fillcache(n is None and 1 or n)
        if n is None:
            result = self._cache.popleft()
        else:
            result = [self._cache.popleft() for i in range(n)]
        return result

    def peek(self, n=None):
        self._fillcache(n is None and 1 or n)
        if n is None:
            result = self._cache[0]
        else:
            result = [self._cache[i] for i in range(n)]
        return result

Discussion:

In my recipe for starzip (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302325), I needed to be able to peek at the first element of an iterator without actually removing the element from the iterator. The solution I used for that class is the peek function above. For starzip, the peek function was a reasonable solution -- extract the desired items and reconstruct the iterator using itertools.chain.

Recently, I was working on a simple parsing problem and again I wanted to be able to peek at the elements of an iterator. But this time, I needed to do multiple peeks, at many different points in the iterator. The peek function would have been unwieldy here -- not only would i have had to constantly reassign to my iterator variable, but I would have wrapped the iterator repeatedly in itertools.chain calls, which could mean substantial overhead because of the repeated function calls.

The solution provided by the peekable class avoids the repeated function calls by keeping a cache of items read from the iterator (implemented here as a deque). A side benefit of this approach is that if too many items are requested from the iterator, raising the StopIteration does not throw away the last values of the iterator. For example, while a StopIteration will be raised if next(5) is called when only 3 items are left in the iterator, a call later to next(3) will still be able to return these 3 items.

(The peekable solution is something akin to Danny Yoo's ExtendedIter in http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/197530, but allows for a larger lookahead and substantially simplifies the code by using a deque.)

[14 Sept 2004 -- Removed some comments to condense code.]
[16 Sept 2004 -- Changed default argument as per Peter Otten's suggestion.]



Add comment

Number of comments: 5

Minor modification, Peter Otten, 2004/09/15
I suggest that you alter the behaviour of the 'n' argument to default to None. This allows for two ways to peek a single item: peek(it) -> elem and peek(it, 1) -> [elem]. I expect that this is less likely to fail if the n parameter is calculated at runtime.
Sample implementation:

>>> import itertools as it
>>> def peek(iterable, n=None):
...     a, b = it.tee(iterable)
...     if n is None:
...             return a.next(), b
...     else:
...             return list(it.islice(a, n)), b
...
>>> peek(range(5))
(0, )
>>> list(_[1])
[0, 1, 2, 3, 4]
>>> peek(range(5), 1)
([0], )
>>> list(_[1])
[0, 1, 2, 3, 4]
>>> peek(range(5), 3)
([0, 1, 2], )
>>> list(_[1])
[0, 1, 2, 3, 4]

Add comment

2004/09/16
The idea with None is a good one. I'll update the recipe soon to do it this way.

About itertools.tee though... Looking at the documentation, I couldn't tell what happens when you tee off an iterator and then only use one element of it. The code at http://docs.python.org/lib/itertools-example.html seems to suggest that all the items will be stored until all of the iterators are used up. (Meaning that in the case here, all items but the first will be stored since b is exhausted while a uses only one element.) Do you know what the current implementation does?
Add comment

Peter Otten, 2004/09/23
It seems the "gap" items are only stored until one iterator is garbage-collected. At least the following script had no problems with memory consumption when called with or without the '-p' option:

import sys, itertools

def peek(iterable, n=None):
    a, b = itertools.tee(iterable)
    if n is None:
        return a.next(), b
    else:
        return list(itertools.islice(a, n)), b

def gen(r=range(10)):
    while 1:
        for i in r:
            yield i

a = gen()
if "-p" in sys.argv:
    data, a = peek(a)

for i in xrange(10**6):
    a.next()
raw_input("ready")

Add comment

Worried about itertools.chain, Shannon -jj Behrens, 2007/01/16
I'm worried about the following line in peek:

return peek_result, itertools.chain(iter_result, iterable)
It turns out that this adds another level of indirection. If you're peeking twice per element at a list of elements that won't fit in memory, you end up with too many levels of indirection and a segfault. This happened to me. I used gdb to look at the core, and I literally had tens of thousands of frames on the stack. I only need to peek one element in, so I suggest the following:
__docformat__ = "restructuredtext"


class peekable:

    """Make an iterator peekable.

    This is implemented with an eye toward simplicity.  On the downside,
    you can't do things like peek more than one item ahead in the
    iterator.  On the bright side, it doesn't require anything from
    itertools, etc., so it's less likely to encounter strange bugs,
    which occassionally do happen.

    Example usage::

        >>> numbers = peekable(range(6))
        >>> numbers.next()
        0
        >>> numbers.next()
        1
        >>> numbers.peek()
        2
        >>> numbers.next()
        2
        >>> numbers.next()
        3
        >>> for i in numbers:
        ...     print i
        ... 
        4
        5

    """

    _None = ()  # Perhaps None is a valid value.

    def __init__(self, iterable):
        self._iterable = iter(iterable)
        self._buf = self._None

    def __iter__(self):
        return self

    def _is_empty(self):
        return self._buf is self._None

    def peek(self):
        """Peek at the next element.

        This may raise StopIteration.

        """
        if self._is_empty():
            self._buf = self._iterable.next()
        return self._buf

    def next(self):
        if self._is_empty():
            return self._iterable.next()
        ret = self._buf
        self._buf = self._None
        return ret

Add comment

Huh?, Steven Bethard, 2007/03/19
Did you look at the peekable class in the recipe? It doesn't use itertools.chain. You shouldn't have any risk of getting too many frames on the stack. (That's pretty much the whole point of using peekable() instead of peek() -- the former is much more efficient for repeated calls.)
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Changing return value ...

5. Quantum Superposition

6. Pickle objects under ...

7. Generalized delegates ...

8. Reorder a sequence (uses ...

9. Setting Win32 System ...

10. ObjectMerger




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.