|
|
 |
|
Title: Shell-like data processing
Submitter: Maxim Krikun
(other recipes)
Last Updated: 2004/04/06
Version no: 1.1
Category:
Text
|
|
4 vote(s)
|
|
|
|
Description:
This module introduces an alternative syntax a-la shell pipes for sequence-oriented functions, such as filter, map, etc., via certain classes that override __ror__ method.
Source: Text Source
from itertools import izip, imap, count, ifilter
import re
def cat(fname):
return file(fname).xreadlines()
class grep:
"""keep only lines that match the regexp"""
def __init__(self,pat,flags=0):
self.fun = re.compile(pat,flags).match
def __ror__(self,input):
return ifilter(self.fun,input)
class tr:
"""apply arbitrary transform to each sequence element"""
def __init__(self,transform):
self.tr=transform
def __ror__(self,input):
return imap(self.tr,input)
class printlines_class:
"""print sequence elements one per line"""
def __ror__(self,input):
for l in input:
print l
printlines=printlines_class()
class terminator:
"""to be used at the end of a pipe-sequence"""
def __init__(self,method):
self.process=method
def __ror__(self,input):
return self.process(input)
aslist = terminator(list)
asdict = terminator(dict)
astuple = terminator(tuple)
enum = terminator( lambda input: izip(count(),input) )
cat('/etc/passwd') | tr(str.rstrip) | grep('.*/bin/bash') | printlines
dir(int) | grep('__r') | aslist
'lambda' | enum | asdict
Discussion:
Python has several functions that do operate on sequential data, a.e. filter, map, zip, sum, etc. However, to do some complicated processing one has to introduce intermediate variables, or build complex nested function calls or list comrehencions. This is not as elegant as, for example, unix shell command "cat foo.bar | grep smth | sort | uniq".
Inspired by a "C++-like iostream" recipe by Erik Max Francis (no. 157034 in this cookbook), i made this quick-hack emulation of shell pipe syntax. The main advantage of such syntax is that the distinct operations in a sequence are located between |'s, so there is no messing brackets, and no extra variables too.
This is also useful in interactive mode, to see the content of a generator. It seems easier to add "| aslist" at the end of an expression than to enclose the whole expression in list(...) constructor.
Note also, that everything here makes use of generators when possible, so no extra memory is consumed during processing.
|
|
Add comment
|
|
Number of comments: 4
Fantastic hack! , Garth Kidd, 2004/04/11
I'd like to see a module along these lines as part of the standard
library. Add cut, uniq, sort, and a few others, and you'd really
have something.
A wee fussy mod or two:
class match:
"""keep only lines that match the regexp"""
def __init__(self, pat, flags=0, method='match'):
self.fun = getattr(re.compile(pat, flags), method)
def __ror__(self, input):
return ifilter(self.fun, input)
class search(match):
def __init__(self, pat, flags=0):
match.__init__(self, pat, flags, method='search')
grep = search
... and then later...class writelines:
"write each item to a file like object"
def __init__(self, f):
self.file = f
def __ror__(self, input):
for l in input:
self.file.write(l)
printlines = writelines(sys.stdout)
One makes grep behave more commandliney (grep doesn't insist upon matching at the beginning of the first line by default) and the other adds a more generic writelines method and re-implements printlines with it -- also eliminating an issue in which cat(file)|printlines would double the newlines.
Add comment
Hmmm, Garth Kidd, 2004/04/11
We could define def cat(fname, mode='rtU'):
return file(fname).xreadlines()but it's a little redundant as one can just as easily use file() as easily in Python 2.3, at least, thanks to iter(file-like-object) being suitabily equivalent to iter(file-like-object.xrealines()) in behaviour.
Add comment
Re: fantastic hack, Maxim Krikun, 2004/04/12
Thank you for your feedback. In fact, there was a similar hierarchy in the original module, but i did cut this off before posting here in order to be more easy-readable.
This receipe is, first of all, to demonstrate the new syntax, not to propose a complete module.
Of course one can add analogs to other unix-tools, and the issues concerning enchancements in recent python versions, such as file() iterability, enumerate() function et cetera should be considered. Hovewer i prefer to avoid writing code before i really need it.
Add comment
module URL, Maxim Krikun, 2004/04/16
I had put module text and related discussion to my personal wiki-page: http://lbss.math.msu.su/~krikun/PipeSyntaxModule
Add comment
|
|
|
|
|
 |
|