ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Groupby for ndarrays.
Submitter: Alexander Ross (other recipes)
Last Updated: 2006/07/11
Version no: 1.0
Category: Algorithms

 

Not Rated yet


Description:

This is a groupby function for arrays. Given a list of arrays and a `key` function, it will group each array based on the value of `key(args[0])`. The returned arrays will be two dimensional. The size of the first dimension is equal to the number of groups, and the size of the second dimension is equal to the size of the largest group. All of the smaller groups are padded with the value of the keyword argument `fill_value`.

There's also a short recipe in here for functional composition.

Source: Text Source

from numpy import array, vectorize, unique1d, ones
from operator import itemgetter
from itertools import imap, groupby, izip

# functional composition
def compose(*args):
    def composed(arg):
        for f in reversed(args):
            arg = f(arg)
        return arg
    return composed

def agroupby(*args, **kwds):
    """A groupby function which accepts and returns arrays.
    All passed arrays are expected to be one dimensional
    and of the same shape. All of the arrays are grouped by
    `key(arg[0])` and then returned.  The returned arrays will
    be two dimensional with each row corresponding to a group.
    The size of the first dimension is equal to the number of
    groups, and the size of the second dimension is equal the
    the size of the largest groups.  All smaller groups are
    padded with the value of the keyword argument `fill_value`."""
    keyfunc = kwds.get('key', lambda a: a)
    fill_val = kwds.get('fill_value', 0.0)
    args = [a.copy() for a in args]
    argsort = sorted(enumerate(args[0]), key=compose(keyfunc,itemgetter(1)))
    indexsort = [index for index, item in argsort]
    args = [a.take(indexsort) for a in args]
    # calculate groups
    g_mask = keyfunc(args[0])
    g_set = unique1d(g_mask)
    g_max = max([g_mask[g_mask==g].shape[0] for g in g_set])
    g_args = [fill_val * ones((len(g_set), g_max), dtype=a.dtype) for a in args]
    for gix, gval in enumerate(g_set):
        for ga, a in izip(g_args, args):
            b = a[g_mask==gval]
            ga[gix,:len(b)] = b
    return tuple(g_args)


if __name__ == "__main__":
    from numpy import arange, set_printoptions, random
    set_printoptions(precision=2, suppress=True, linewidth=60);
    b = arange(100, 200)
    c = agroupby(b, key=lambda x: x%10)
    print c
    
    a = random.geometric(0.01, 20)
    b = a + 20
    c, d = agroupby(a, b, key=lambda x: x%10)
    print c
    print d

Discussion:

I wrote this to group an array of values by the dates on which the values were recorded. So, if `dates` were an array of `datetime` instances, and `vals` were an array of values recorded on each of those dates, you could group `dates` and `vals` by the month in which they were recorded by calling:

agroupby(dates, vals, key=lambda dt: dt.month)



Add comment

No comments.



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Treat the Win32 Registry ...

5. a friendly mkdir()

6. Wrapping template engine ...

7. Assignment in expression

8. Changing return value ...

9. Implementation of sets ...

10. bag collection class




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.