ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Pyline: a grep-like, sed-like command-line tool.
Submitter: Graham Fawcett (other recipes)
Last Updated: 2006/03/30
Version no: 1.2
Category: System

 

5 stars 5 vote(s)


Description:

This utility was born from the fact that I keep forgetting how to use
"sed", and I suck at Perl. It brings ad-hoc command-line piping
sensibilities to the Python interpeter. (Version 1.2 does better
outputting of list-like results, thanks to Mark Eichin.)

Source: Text Source

#!/usr/bin/env python

# updated 2005.07.21, thanks to Jacob Oscarson
# updated 2006.03.30, thanks to Mark Eichin

import sys
import re
import getopt

# parse options for module imports
opts, args = getopt.getopt(sys.argv[1:], 'm:')
opts = dict(opts)
if '-m' in opts:
    for imp in opts['-m'].split(','):
        locals()[imp] = __import__(imp.strip())

cmd = ' '.join(args)
if not cmd.strip():
    cmd = 'line'                        # no-op
    
codeobj = compile(cmd, 'command', 'eval')
write = sys.stdout.write

for numz, line in enumerate(sys.stdin):
    line = line[:-1]
    num = numz + 1
    words = [w for w in line.strip().split(' ') if len(w)]
    result =  eval(codeobj, globals(), locals())
    if result is None or result is False:
        continue
    elif isinstance(result, list) or isinstance(result, tuple):
        result = ' '.join(map(str, result))
    else:
        result = str(result)
    write(result)
    if not result.endswith('\n'):
        write('\n')

Discussion:

Save the script as 'pyline' somewhere on your path, e.g. /usr/local/bin/pyline, and make it executable (e.g. chmod +x /usr/local/bin/pyline).

---

When working at the command line, it's very useful to pipe multiple
commands together. Common tools used in pipes include 'head' (show the
top lines of a file), 'tail' (show the bottom lines), 'grep' (search
the text for a pattern), 'sed' (reformat the text), etc. However,
Python is found lacking in this regard, because it's hard to write the
kind of one-liner that works well in an ad-hoc pipe statement.

Pyline tries to solve this problem. Use pyline to apply a Python
expression to every line of standard input, and return a value to be
sent to standard output. The expression can use any installed Python
modules. In the context of the expression, the variable "line" holds
the string value of the line; "words" is a list of all the non-empty,
space-separated words; and "num" is the line number (starting with 1).

Here are a couple examples:

Print out the first 20 characters of every line in the tail of my
Apache access log:

tail access_log | pyline "line[:20]"

Print just the URLs in the access log (the seventh "word" in the line):

tail access_log | pyline "words[6]"

Here's a tricker one, showing how to do an import. List the current
directory, showing only files that are larger than 1 kilobyte:

ls | pyline -m os "os.path.isfile(line) and os.stat(line).st_size > 1024 and line"

I didn't say it was pretty. ;-) The "-m a,b,c" option will import
modules a, b and c for use in the subsequent expression. The "isfile
and stat and line" form shows how to do filtering: if an expression
returns a False or None value, then no line is sent to stdout.

This last tricky example re-implements the 'md5sum' command, to return
the MD5 digest values of all the .py files in the current directory.

ls *.py | pyline -m md5 "'%s %s' % (md5.new(file(line).read()).hexdigest(), line)"

Hopefully you get the idea. I've found it to be an invaluable addition
to my command-line toolkit.

Windows users: it works under Windows, but name it "pyline.py" instead of "pyline", and call it via a batch file so that the piping works properly.



Add comment

Number of comments: 18

getopt alternative, Jacob Oscarson, 2005/07/21
Very practical script! Here is an alternative to using the import(..); construct in the python code: use getopt to get an option ('-m' here) with a list of modules to import. Import the getopt module, then replace code between 8 and 16 with this code:

opts, args = getopt.getopt(sys.argv[1:], 'm:')

opts = dict(opts)
if '-m' in opts:
    for imp in opts['-m'].split(','):
        locals()[imp] = __import__(imp.strip())

cmd = ' '.join(args)
if not cmd.strip():
    cmd = 'line'                        # no-op
The import list is comma separated with no spaces. Example:
cat 'foo' | pyline -m sys,os "

Add comment

That's a great idea., Graham Fawcett, 2005/07/21
Oh, that's much better. Great idea, Jacob; I've updated the code with your recommendation.
Add comment

incorrect EOL handling, Denis Barmenkov, 2005/07/22

line = line[:-1]
better way:
line = string.split(line, '\n')[0]

Add comment

sasa sasa, 2005/07/30
what about string.split() ?
Add comment

ooops, shout think before typing:, sasa sasa, 2005/07/30
what about line.strip()
Add comment

line.strip() and side-effects, Graham Fawcett, 2005/08/15
I didn't want to use line.strip() in case the whitespace in the output was significant. I'm not sure that line.split('\n')[0] is more correct than line[:-1], though perhaps there are some Python implementations where this is an important?
Add comment

auto-handle lists, Mark Eichin, 2005/08/01

            if isinstance(result, list):
                result = " ".join(map(str, result))
            result = str(result)
allows things like
   pyline 'words[-1::-1]'
to do the obvious thing. (You can get back the original less desirable behaviour by simply wrapping the arg in repr() so there's no loss in generality.)
Add comment

+0, Graham Fawcett, 2005/08/15
I see your point, and can imagine cases where a string-joined list representation would be favourable. I'm a bit hesitant, though; sometimes the list-formatted output is easier to read. Sometimes, I've used 'pyline "words"', just to get better visual delimiting between words in the output.

Maybe this is a behaviour that could be turned on via a command-line flag?

-j or --join: join list-like result via ' '.join(map(str, result))
Thoughts?
Add comment

re: auto-handle lists, Graham Fawcett, 2006/03/30
Mark, after frequent use of the script, I've seen the error of my ways. List (and tuple) results are now joined via ' '.join(map(str, result)). Thanks.
Add comment

using on windows, Michael Soulier, 2006/03/30
"""Windows users: it works under Windows, but name it "pyline.py" instead of "pyline", and call it via a batch file so that the piping works properly.""" Better yet. Add .PY to your PATHEXT environment variable. Then all python scripts can be called without extension.
Add comment

Using on Windows???, John Clark, 2006/03/30
I am having trouble using this on windows - I already had .py as part of my pathext environment variables, but when I run something like:

ls | pyline -m os "os.path.isfile(line) and os.stat(line).st_size > 1024 and line"

I end up with:

Traceback (most recent call last):
File "C:\windows\usr\utilities\pyline.py", line 24, in ?
for numz, line in enumerate(sys.stdin):
IOError: [Errno 9] Bad file descriptor


What I am wondering is if even though I have .py in PATHEXT, there is still something to the statement "and call it via a batch file so that the piping works properly."

Anybody have an idea as to why this is happening?
Add comment

Yes, it's got to be a batch file., Graham Fawcett, 2006/03/30
"What I am wondering is if even though I have .py in PATHEXT, there is still something to the statement 'and call it via a batch file so that the piping works properly.'"

Yes, it's got to be a batch file. I don't know the deep reasons for it, but a Web search for "python pipe windows bad file descriptor" might turn it up for you.

Here's a sample pyline.bat file. It assumes that pyline.py (the recipe) is in c:\python24; adjust as necessary.

@echo off
python c:\python24\pyline.py %1 %2 %3 %4 %5 %6 %7 %8 %9

Add comment

With xxdiff scripts..., Martin Blais, 2006/04/03
Pretty nice, I was inspired: I wrote an additional xxdiff transformation script that uses this transformation method, similar to xxdiff-rename/xxdiff-filter, etc. This allows you to review the changes with a side-by-side graphical diff before they are applied, and you get backup files automatically as well. You can also cherry-pick the desired changes and save them over the original files. The new script is called xxdiff-pyline: http://furius.ca/xxdiff/doc/xxdiff-scripts.html#xxdiff-pyline (Note: all the scripts described in the documentation will be released with xxdiff 3.2 (soon)).
Add comment

snapshots, Martin Blais, 2006/04/03
Snapshots here until I release it: http://furius.ca/downloads/xxdiff/snapshots/
Add comment

Nice, Graham Fawcett, 2006/04/04
Nice application of the idea, Martin. Thanks. :-)
Add comment

Object-oriented shell, Jack Orenstein, 2006/07/08
I've implemented something based on a similar idea, named osh. However, instead of using the shell's pipe to connect commands, I have everything running in one Python process. For example:

    osh f 'path(".").files()' ^ expand ^ select 'file: file.size > 100000' ^ f 'file: (str(file), file.size)' $

- osh: invokes osh.

- f 'path(".").files()': Run the function f, producing a list of files in the current directory.

- ^: The osh symbol for piping objects.

- expand: Turn the list into a stream of objects in the streams (files).

- select 'file: file.size > 100000': If a file received as input has size > 100000 then pass it to on as output, otherwise discard it.

- f 'file: (str(file), file.size)': Apply a function taking a file as input and generating a tuple of (file name, file size) as output. 

- $: Render input objects as strings and print to stdout.
For more info: http://geophile.com/osh
Add comment

Non getopt alternative, Chris Stromberger, 2007/05/23
Instead of passing in -m and the list of modules, something like this might make it even simpler to use (let the script figure out what modules are needed):

import re
possibleModules = re.findall(r'(\w+)\.', cmd)
for m in possibleModules:
  try:
    locals()[m] = __import__(m)
  except:
    pass
Not tested much, but it works with the os and md5 examples given above.
Add comment

Input field separator, Yannick Loiseau, 2007/07/12
nice script! small, useful, elegant. Here is a little patch to allow alternative input field separator (à la awk)

------------------------------------------------------------
--- pyline      2007-07-12 12:13:19.000000000 +0200
+++ pyline.new  2007-07-12 12:04:08.000000000 +0200
@@ -7,12 +7,17 @@
 import re
 import getopt
 
+FS = " "
+
 # parse options for module imports
-opts, args = getopt.getopt(sys.argv[1:], 'm:')
+opts, args = getopt.getopt(sys.argv[1:], 'm:F:')
 opts = dict(opts)
 if '-m' in opts:
     for imp in opts['-m'].split(','):
         locals()[imp] = __import__(imp.strip())
+if '-F' in opts:
+    FS = opts['-F']
+    
 
 cmd = ' '.join(args)
 if not cmd.strip():
@@ -24,7 +29,7 @@
 for numz, line in enumerate(sys.stdin):
     line = line[:-1]
     num = numz + 1
-    words = [w for w in line.strip().split(' ') if len(w)]
+    words = [w for w in line.strip().split(FS) if len(w)]
     result =  eval(codeobj, globals(), locals())
     if result is None or result is False:
         continue
------------------------------------------------------------
e.g.
$ echo "foo;bar;baz" | pyline -F ';'  "words[1]"
bar
Thought about the same option for output fields, but it's easy to do
$ echo "foo bar baz" | pyline " ';'.join(words[0:2]) "
foo;bar

Add comment



Highest rated recipes:

1. A simple XML-RPC server

2. Web service accessible ...

3. IPy Notify

4. Treat the Win32 Registry ...

5. a friendly mkdir()

6. Wrapping template engine ...

7. Assignment in expression

8. Changing return value ...

9. Implementation of sets ...

10. bag collection class




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.