Pyline: a grep-like, sed-like command-line tool. « Python recipes

This utility was born from the fact that I keep forgetting how to use "sed", and I suck at Perl. It brings ad-hoc command-line piping sensibilities to the Python interpeter. (Version 1.2 does better outputting of list-like results, thanks to Mark Eichin.)

      #!/usr/bin/env python

# updated 2005.07.21, thanks to Jacob Oscarson
# updated 2006.03.30, thanks to Mark Eichin

import sys
import re
import getopt

# parse options for module imports
opts, args = getopt.getopt(sys.argv[1:], 'm:')
opts = dict(opts)
if '-m' in opts:
    for imp in opts['-m'].split(','):
        locals()[imp] = __import__(imp.strip())

cmd = ' '.join(args)
if not cmd.strip():
    cmd = 'line'                        # no-op
    
codeobj = compile(cmd, 'command', 'eval')
write = sys.stdout.write

for numz, line in enumerate(sys.stdin):
    line = line[:-1]
    num = numz + 1
    words = [w for w in line.strip().split(' ') if len(w)]
    result =  eval(codeobj, globals(), locals())
    if result is None or result is False:
        continue
    elif isinstance(result, list) or isinstance(result, tuple):
        result = ' '.join(map(str, result))
    else:
        result = str(result)
    write(result)
    if not result.endswith('\n'):
        write('\n')

      

Save the script as 'pyline' somewhere on your path, e.g. /usr/local/bin/pyline, and make it executable (e.g. chmod +x /usr/local/bin/pyline).

When working at the command line, it's very useful to pipe multiple commands together. Common tools used in pipes include 'head' (show the top lines of a file), 'tail' (show the bottom lines), 'grep' (search the text for a pattern), 'sed' (reformat the text), etc. However, Python is found lacking in this regard, because it's hard to write the kind of one-liner that works well in an ad-hoc pipe statement.

Pyline tries to solve this problem. Use pyline to apply a Python expression to every line of standard input, and return a value to be sent to standard output. The expression can use any installed Python modules. In the context of the expression, the variable "line" holds the string value of the line; "words" is a list of all the non-empty, space-separated words; and "num" is the line number (starting with 1).

Here are a couple examples:

Print out the first 20 characters of every line in the tail of my Apache access log:

tail access_log | pyline "line[:20]"

Print just the URLs in the access log (the seventh "word" in the line):

tail access_log | pyline "words[6]"

Here's a tricker one, showing how to do an import. List the current directory, showing only files that are larger than 1 kilobyte:

ls | pyline -m os "os.path.isfile(line) and os.stat(line).st_size > 1024 and line"

I didn't say it was pretty. ;-) The "-m a,b,c" option will import modules a, b and c for use in the subsequent expression. The "isfile and stat and line" form shows how to do filtering: if an expression returns a False or None value, then no line is sent to stdout.

This last tricky example re-implements the 'md5sum' command, to return the MD5 digest values of all the .py files in the current directory.

ls *.py | pyline -m md5 "'%s %s' % (md5.new(file(line).read()).hexdigest(), line)"

Hopefully you get the idea. I've found it to be an invaluable addition to my command-line toolkit.

Windows users: it works under Windows, but name it "pyline.py" instead of "pyline", and call it via a batch file so that the piping works properly.

Tags: sysadmin

22 comments

Jacob Oscarson 18 years, 9 months ago # | flag

getopt alternative. Very practical script! Here is an alternative to using the import(..); construct in the python code: use getopt to get an option ('-m' here) with a list of modules to import.

Import the getopt module, then replace code between 8 and 16 with this code:

opts, args = getopt.getopt(sys.argv[1:], 'm:')

opts = dict(opts)
if '-m' in opts:
    for imp in opts['-m'].split(','):
        locals()[imp] = __import__(imp.strip())

cmd = ' '.join(args)
if not cmd.strip():
    cmd = 'line'                        # no-op

The import list is comma separated with no spaces. Example:

cat 'foo' | pyline -m sys,os "

Graham Fawcett (author) 18 years, 9 months ago # | flag

That's a great idea. Oh, that's much better. Great idea, Jacob; I've updated the code with your recommendation.

Denis Barmenkov 18 years, 9 months ago # | flag

incorrect EOL handling.

line = line[:-1]

better way:

line = string.split(line, '\n')[0]

sasa sasa 18 years, 9 months ago # | flag

what about string.split() ?

sasa sasa 18 years, 9 months ago # | flag

ooops, shout think before typing: what about line.strip()

Mark Eichin 18 years, 8 months ago # | flag

auto-handle lists.

if isinstance(result, list):
    result = " ".join(map(str, result))
result = str(result)

allows things like

pyline 'words[-1::-1]'

to do the obvious thing. (You can get back the original less desirable behaviour by simply wrapping the arg in repr() so there's no loss in generality.)

Graham Fawcett (author) 18 years, 8 months ago # | flag

line.strip() and side-effects. I didn't want to use line.strip() in case the whitespace in the output was significant.

I'm not sure that line.split('\n')[0] is more correct than line[:-1], though perhaps there are some Python implementations where this is an important?

Graham Fawcett (author) 18 years, 8 months ago # | flag

+0. I see your point, and can imagine cases where a string-joined list representation would be favourable. I'm a bit hesitant, though; sometimes the list-formatted output is easier to read. Sometimes, I've used 'pyline "words"', just to get better visual delimiting between words in the output.

Maybe this is a behaviour that could be turned on via a command-line flag?

-j or --join: join list-like result via ' '.join(map(str, result))

Thoughts?

Graham Fawcett (author) 18 years ago # | flag

re: auto-handle lists. Mark, after frequent use of the script, I've seen the error of my ways. List (and tuple) results are now joined via ' '.join(map(str, result)). Thanks.

Michael Soulier 18 years ago # | flag

using on windows. """Windows users: it works under Windows, but name it "pyline.py" instead of "pyline", and call it via a batch file so that the piping works properly."""

Better yet. Add .PY to your PATHEXT environment variable.

Then all python scripts can be called without extension.

John Clark 18 years ago # | flag

Using on Windows??? I am having trouble using this on windows - I already had .py as part of my pathext environment variables, but when I run something like:

ls | pyline -m os "os.path.isfile(line) and os.stat(line).st_size > 1024 and line"

I end up with:

Traceback (most recent call last):

File "C:\windows\usr\utilities\pyline.py", line 24, in ?

for numz, line in enumerate(sys.stdin):

IOError: [Errno 9] Bad file descriptor

What I am wondering is if even though I have .py in PATHEXT, there is still something to the statement "and call it via a batch file so that the piping works properly."

Anybody have an idea as to why this is happening?

Graham Fawcett (author) 18 years ago # | flag

Yes, it's got to be a batch file. "What I am wondering is if even though I have .py in PATHEXT, there is still something to the statement 'and call it via a batch file so that the piping works properly.'"

Yes, it's got to be a batch file. I don't know the deep reasons for it, but a Web search for "python pipe windows bad file descriptor" might turn it up for you.

Here's a sample pyline.bat file. It assumes that pyline.py (the recipe) is in c:\python24; adjust as necessary.

@echo off
python c:\python24\pyline.py %1 %2 %3 %4 %5 %6 %7 %8 %9

Martin Blais 18 years ago # | flag

With xxdiff scripts... Pretty nice, I was inspired: I wrote an additional xxdiff transformation script that uses this transformation method, similar to xxdiff-rename/xxdiff-filter, etc.

This allows you to review the changes with a side-by-side graphical diff before they are applied, and you get backup files automatically as well. You can also cherry-pick the desired changes and save them over the original files.

The new script is called xxdiff-pyline: http://furius.ca/xxdiff/doc/xxdiff-scripts.html#xxdiff-pyline

(Note: all the scripts described in the documentation will be released with xxdiff 3.2 (soon)).

Martin Blais 18 years ago # | flag

snapshots. Snapshots here until I release it: http://furius.ca/downloads/xxdiff/snapshots/

Graham Fawcett (author) 18 years ago # | flag

Nice. Nice application of the idea, Martin. Thanks. :-)

Jack Orenstein 17 years, 9 months ago # | flag

Object-oriented shell. I've implemented something based on a similar idea, named osh. However, instead of using the shell's pipe to connect commands, I have everything running in one Python process. For example:

    osh f 'path(".").files()' ^ expand ^ select 'file: file.size > 100000' ^ f 'file: (str(file), file.size)' $

- osh: invokes osh.

- f 'path(".").files()': Run the function f, producing a list of files in the current directory.

- ^: The osh symbol for piping objects.

- expand: Turn the list into a stream of objects in the streams (files).

- select 'file: file.size > 100000': If a file received as input has size > 100000 then pass it to on as output, otherwise discard it.

- f 'file: (str(file), file.size)': Apply a function taking a file as input and generating a tuple of (file name, file size) as output.

- $: Render input objects as strings and print to stdout.

For more info: http://geophile.com/osh

Chris Stromberger 16 years, 11 months ago # | flag

Non getopt alternative. Instead of passing in -m and the list of modules, something like this might make it even simpler to use (let the script figure out what modules are needed):

import re
possibleModules = re.findall(r'(\w+)\.', cmd)
for m in possibleModules:
  try:
    locals()[m] = __import__(m)
  except:
    pass

Not tested much, but it works with the os and md5 examples given above.

Yannick Loiseau 16 years, 9 months ago # | flag

Input field separator. nice script! small, useful, elegant. Here is a little patch to allow alternative input field separator (à la awk)

------------------------------------------------------------
--- pyline      2007-07-12 12:13:19.000000000 +0200
+++ pyline.new  2007-07-12 12:04:08.000000000 +0200
@@ -7,12 +7,17 @@
 import re
 import getopt

+FS = " "
+
 # parse options for module imports
-opts, args = getopt.getopt(sys.argv[1:], 'm:')
+opts, args = getopt.getopt(sys.argv[1:], 'm:F:')
 opts = dict(opts)
 if '-m' in opts:
     for imp in opts['-m'].split(','):
         locals()[imp] = __import__(imp.strip())
+if '-F' in opts:
+    FS = opts['-F']
+

 cmd = ' '.join(args)
 if not cmd.strip():
@@ -24,7 +29,7 @@
 for numz, line in enumerate(sys.stdin):
     line = line[:-1]
     num = numz + 1
-    words = [w for w in line.strip().split(' ') if len(w)]
+    words = [w for w in line.strip().split(FS) if len(w)]
     result =  eval(codeobj, globals(), locals())
     if result is None or result is False:
         continue
------------------------------------------------------------

e.g.

$ echo "foo;bar;baz" | pyline -F ';'  "words[1]"
bar

Thought about the same option for output fields, but it's easy to do

$ echo "foo bar baz" | pyline " ';'.join(words[0:2]) "
foo;bar

Jeremy Sproat 15 years, 3 months ago # | flag

It's interesting how this compares to grin . I definitely like being able to put arbitrary python code on the command line like pyline, but it'd be awesome if pyline could do grin-like magic like implicitly grep the current directory.

Pádraig Brady 14 years, 10 months ago # | flag

Cool! I just noticed this while searching for something else. I've created a very similar script called funcpy: http://www.pixelbeat.org/scripts/funcpy

markild 14 years, 1 month ago # | flag

Love it!

Any chance something like this could be put on github or something? Guess that would make keeping track of versions a tad easier :)

Wes Turner 8 years, 8 months ago # | flag

Thank you so much! I've taken the liberty of packaging and maintaining compatibility with this script on Github and on PyPi:

https://pypi.python.org/pypi/pyline
https://github.com/westurner/pyline (master == stable)
https://github.com/westurner/pyline/tree/develop

Also great:

http://pypi.python.org/pypi/grin (grin, grind)
https://code.google.com/p/pyp/ (pyp)

◄	Python recipes (4591)	►
◄	Graham Fawcett's recipes (1)	►

Pyline: a grep-like, sed-like command-line tool. (Python recipe) by Graham Fawcett
ActiveState Code (http://code.activestate.com/recipes/437932/)

22 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Pyline: a grep-like, sed-like command-line tool. (Python recipe) by Graham Fawcett ActiveState Code (http://code.activestate.com/recipes/437932/)

22 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Pyline: a grep-like, sed-like command-line tool. (Python recipe) by Graham Fawcett
ActiveState Code (http://code.activestate.com/recipes/437932/)