ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Shell-like data processing
Submitter: Maxim Krikun (other recipes)
Last Updated: 2004/04/06
Version no: 1.1
Category: Text

 

4 stars 4 vote(s)


Description:

This module introduces an alternative syntax a-la shell pipes for sequence-oriented functions, such as filter, map, etc., via certain classes that override __ror__ method.

Source: Text Source

from itertools import izip, imap, count, ifilter
import re

def cat(fname):
    return file(fname).xreadlines()

class grep:
    """keep only lines that match the regexp"""
    def __init__(self,pat,flags=0):
        self.fun = re.compile(pat,flags).match
    def __ror__(self,input):
        return ifilter(self.fun,input)

class tr:
    """apply arbitrary transform to each sequence element"""
    def __init__(self,transform):
        self.tr=transform
    def __ror__(self,input):
        return imap(self.tr,input)

class printlines_class:
    """print sequence elements one per line"""
    def __ror__(self,input):
        for l in input:
            print l

printlines=printlines_class()

class terminator:
    """to be used at the end of a pipe-sequence"""
    def __init__(self,method):
        self.process=method
    def __ror__(self,input):
        return self.process(input)

# those objects transform generator to list, tuple or dict
aslist  = terminator(list)
asdict  = terminator(dict)
astuple = terminator(tuple)

# this object transforms seq to tuple sequence
enum = terminator( lambda input: izip(count(),input) )

#######################
# example 1: equivalent to shell grep ".*/bin/bash" /etc/passwd
cat('/etc/passwd') | tr(str.rstrip) | grep('.*/bin/bash') | printlines

#######################
# example 2: get a list of int's methods beginning with '__r'
dir(int) | grep('__r') | aslist

#######################
# example 3: useless; returns a dict {0:'l',1:'a',2:'m',3:'b',4:'d',5:'a'} 
'lambda' | enum | asdict

Discussion:

Python has several functions that do operate on sequential data, a.e. filter, map, zip, sum, etc. However, to do some complicated processing one has to introduce intermediate variables, or build complex nested function calls or list comrehencions. This is not as elegant as, for example, unix shell command "cat foo.bar | grep smth | sort | uniq".

Inspired by a "C++-like iostream" recipe by Erik Max Francis (no. 157034 in this cookbook), i made this quick-hack emulation of shell pipe syntax. The main advantage of such syntax is that the distinct operations in a sequence are located between |'s, so there is no messing brackets, and no extra variables too.

This is also useful in interactive mode, to see the content of a generator. It seems easier to add "| aslist" at the end of an expression than to enclose the whole expression in list(...) constructor.

Note also, that everything here makes use of generators when possible, so no extra memory is consumed during processing.



Add comment

Number of comments: 4

Fantastic hack! , Garth Kidd, 2004/04/11
I'd like to see a module along these lines as part of the standard library. Add cut, uniq, sort, and a few others, and you'd really have something.

A wee fussy mod or two:

class match:
    """keep only lines that match the regexp"""
    def __init__(self, pat, flags=0, method='match'):
        self.fun = getattr(re.compile(pat, flags), method)
    def __ror__(self, input):
        return ifilter(self.fun, input)

class search(match):
    def __init__(self, pat, flags=0):
        match.__init__(self, pat, flags, method='search')

grep = search
... and then later...
class writelines:
    "write each item to a file like object"
    def __init__(self, f):
        self.file = f
    def __ror__(self, input):
        for l in input:
            self.file.write(l)

printlines = writelines(sys.stdout)
One makes grep behave more commandliney (grep doesn't insist upon matching at the beginning of the first line by default) and the other adds a more generic writelines method and re-implements printlines with it -- also eliminating an issue in which cat(file)|printlines would double the newlines.


Add comment

Hmmm, Garth Kidd, 2004/04/11
We could define

def cat(fname, mode='rtU'):
    return file(fname).xreadlines()
but it's a little redundant as one can just as easily use file() as easily in Python 2.3, at least, thanks to iter(file-like-object) being suitabily equivalent to iter(file-like-object.xrealines()) in behaviour.
Add comment

Re: fantastic hack, Maxim Krikun, 2004/04/12
Thank you for your feedback. In fact, there was a similar hierarchy in the original module, but i did cut this off before posting here in order to be more easy-readable. This receipe is, first of all, to demonstrate the new syntax, not to propose a complete module. Of course one can add analogs to other unix-tools, and the issues concerning enchancements in recent python versions, such as file() iterability, enumerate() function et cetera should be considered. Hovewer i prefer to avoid writing code before i really need it.
Add comment

module URL, Maxim Krikun, 2004/04/16
I had put module text and related discussion to my personal wiki-page: http://lbss.math.msu.su/~krikun/PipeSyntaxModule
Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Treat the Win32 Registry ...

5. a friendly mkdir()

6. Wrapping template engine ...

7. Assignment in expression

8. Changing return value ...

9. Implementation of sets ...

10. bag collection class




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.