| Store | Cart

Suggesting a new feature - "Inverse Generators"

From: Jordan Rastrick <jras...@student.usyd.edu.au>
25 Mar 2005 08:04:47 -0800
First, a disclaimer. I am a second year Maths and Computer Science
undergraduate, and this is my first time ever on Usenet (I guess I'm
part of the http generation). On top of that, I have been using Python
for a grand total of about a fortnight now. Hence, I apologise if what
follows is a stupid suggestion, or if its already been made somewhere
else, or if this is not the appropriate place to make it, etc. But I
did honestly do some background checking through the Python
documentation, www.python.org, FAQs, and so forth before making this
post. Basically what I'm trying to say here is, please don't flame this
post :)

Anyway, I have a suggestion to make for a rather radical new feature in
Python. Rather than just struggling to explain and justify it straight
out, I'll give the background of how I thought up the feature - and how
I worked through the ideas I came up with. The example may seem a bit
trivial to start, but bear with me, because I think theres at least
some possibility the suggestion has merit.

I am writing a project in Python atm, and one thing it needs to do is
parse a very simple text file to get a list of records. Its a
straightforward kind of thing that I (and I'm sure you) have done many
times before in many languages.

Each bit of interesting data in the file occurs over 3 or 4 lines of
text. On top of this there are lines of random uninteresting junk as
well.

I decided to write a generator to filter out the junk lines, strip
excess whitespace and the like (generators are one of my favourite
features in Python). So the main body of code looks like this:

def filterJunk(lines):
   for line in lines:
       # Do filtering & cleaning up stuff on line.
       # ...
       yield line

def getData(filename):
    recordList = []
    input = file(filename)
    lines = line.readlines()
    input.close()
    for line in filterJunk(lines):
        # ... create records and add to recordList

So far, so good. Next comes the business of actually creating the
records from the lines of interesting text.

You can probably see the problem I ran into - I need to look at 3 or 4
lines at a time to create the record. Theres no easy way out of this
requirement, because whether the record is contained in 3 or 4 lines is
conditional on the value of the second line. But the for loop isn't
going to allow me to do that.
So I rewrote the for loop above as:

while True:
  try:
  	# ....do stuff repeatedly with lines.next() to add to recordList[]
  except StopIteration:
     pass

This works. (And yes, there are also a multitude of other reasonably
simple ways to approach this problem, some of which I've used in this
past. And probably some fancy ways involving map and reduce as well.)

But it seems ugly to me somehow. And I'm suspicious of anything that
hints of ugliness (in Computer Science, anyway.) I suppose that
Python's getting me into the habit of assuming theres always an elegant
looking solution.

I tried, and failed, to think of some sort of Generator-based concept
(I really do love generators! So simple, but so useful!) that would
allow me to retain a for loop. I guess that, essentially, I feel this
task is really just iterating over the lines, and that iterating over
something in Python is a job for a for loop. Generators allow you to
use for loops all regular iteration. So I wanted to have something
"vaugely Generator like", that would allow me to get back to using a
for loop.

It occured to me - what if I had some sort of "inverse generator" -  a
function with a resumable call mechanism that retained state, but that
would resume when accepting input rather than returning output. For
want of a better term, call it an Acceptor.

If you think of a Generator as a method acting as (an abstraction of) a
source of sequential data, then the Acceptor correspondingly is a sink.


I started playing round with the Acceptor idea, thinking about how it
might work. I imagined a simple mirroring of Generator syntax - using a
new keyword, accept, to take in a value, just as Generator's use yield
to return a value. Here's my first attempt at 'coding' my precious for
loop using an imaginary Acceptor:

def combineIntoRecord(): # This is an acceptor function
       optionalline = None # We may not get given a value for this line
       accept firstline
       accept secondline
       if condition(secondline):
          accept optionalline
       accept lastline
       r = createRecord(firstline, secondline, lastline, optionalline)
       return r

recordlist = []
for line in lines:
     recordlist.append(combineIntoRecord(line))

I've since thought about it somewhat, and its clear I got this first
try just plain wrong. Its got a number of hideous defects. First,
combineIntoRecord() is 'accepting' its values by taking a magical
argument, line, that does not appear in the argument list.

Second, in this particular syntax, do
combineIntoRecord(x) 	at line 50
and
combineIntoRecord(y) 	at line 200
want x and y to be combined into the same record, using the old state,
or is the second trying to start a "new call" to the method?  For an
extreme example, imagine calls made in different modules. Basically,
the jump into combineIntoRecord code needs to somehow specify an
'instance' of the function, with a given existing state. The method
call needs to be bound to something, somehow.

Worst of all, though, since combineIntoRecord(line) doesn't always
return a value, how does the interpreter know when to call append? It
seems the 'magic' of the acceptor is spilling out into the surrounding
code, making it impossible to understand or intelligently parse. The
magicness needs to be contained within the acceptor itself.

How to resolve these issues?
Well, thinking about how generators achieve their magic-like effects:
    - A generator, despite being a function, doesn't return a value in
the usual sense.
    - Calls to a generator don't result in a value, they result in an
Iterator object.
    - The magic happens at the 'yield' keyword.
So, analguously:
   - Perhaps call to combineIntoRecord() shouldn't return a value
   - It should return an acceptor object instance instead
   - Every call results in a different object - solves problem 2 above
   - Acceptor objects have a special method for feeding them values,
takes care of problem 1 above
 BUT:
   - When and what does an acceptor object 'return'?
   - We don't actually want to restrict what an acceptor *function*
returns.
   - Acceptors (unlike generators) should be able to return just as per
ordinary functions
   - Its not the output of the function thats 'magic', its the input to
the function
Thus:
  - We must confine the magic to the input the function recieves - its
argument list.
  - Have a 'magic argument' that 'takes in' the accepted values.
  - This is fine - it keeps the magic within the call of the acceptor
function
  - If an acceptor returns like an ordinary function, it needs to
return when called
  - Hence it needs to know its accepted values "all at once"

>From our earlier thinking, an acceptor, fundamentally, accepts a
sequence of data.

So let the first argument to an acceptor function call be a "magic
sequence" that contains all the values we want to be accepted. Other
arguments may be passed as normal.

A disadvantage: the power of the acceptor is restricted in this
version, since you can't have elements Accepted at any arbitrary point
in the code. This is really a good thing, though - with the old version
it was the case that too much power was making the idea totally
unworkable. What we've done is essentially bound our Acceptor to
something, namely a sequence. Note, though, that we can use any
abstract source of sequential data - including a Generator - so theres
still quite a lot of flexibility in how it gets data for accepting.

To illustrate this model of an Acceptor, heres an example of a simple
Acceptor that returns the first element of a sequence that satisfies
some condtion (which is presumed to be a function):

def firstThatSatisfies(condition):
	accept element
	if condition(element): return element

To use, we simply call:

first = firstThatSatisfies(somelist, condition)

somelist, not appearing in the argument list, gets fed into the accept
statement by the 'acceptor magic'  .

Note how explicit iteration has been removed from this canonical
algorithm altogether. I dont know about you, but I think it looks kind
of neat.

A Slight Digression:

Clearly it might be true that condtion(element) == 0 for every element
in some list. This brings us to the issue of an accept statment that
doesnt get fed any data.

There are two possible ways to deal with this. One is to insist every
accept statement must get called, and raise an exception if this doesnt
happen. However, I prefer the seemingly more flexible option where you
simply say "Bad luck, once the magic sequece is empty, any remaining
acceptor code doesnt get executed, and any acceptor state is lost."
This means an acceptor might not always give a well defined return
value, but this is no worse than something like

def f()
     x = someCalculation()
     bool = someConditionThatWillAlwaysBeFalse()
     if false: return x

This conventional example has a return value of None, and an Acceptor
that doesnt reach an explict return statement can be treated exactly
the same way.

Digression

Now, going back to my original problem and the first Acceptor I wrote:

def combineIntoRecord(): # This is an acceptor function
       optionalline = None # We may not get given a value for this line
       accept firstline
       accept secondline
       if condition(secondline):
          accept optionalline
       accept lastline
       r = createRecord(firstline, secondline, lastline, optionalline)
       return r

This doesnt work anymore - it produces and returns the first record we
want, and then stops. We want a list of all records .

Maybe we could use a loop.The 'accepting magic' would need to be
confied to this loop. And the loop needs to know when the magic stream
of data comes to an end to know when to return the final value. Maybe
use another special keyword:

def combineIntoRecord():
         recordList = [].
         while acceptingonly
        	 # ....accept values, append createRecord to recordLlist


Perhaps. I don't really like the idea of having any magic new keywords
other than accept itself; if it needs this sort of explicit while loop,
maybe it shouldnt be in an Acceptor at all (the whole original point of
this idea was to get rid of while loop after all!) **

So are Acceptors pretty useless, after all that? Maybe. Then again,
maybe not.

What is this combineIntoRecord(), that I have written using
non-existent language features, trying to do, in a general sense?

It wants to take some sequence (the lines in a file) and produce
another sequence, that in some way depends first. Howevers it wants to
be able define basically any arbitary dependence between the sequences,
taking the elements from the first and producing output to the second
at its leisure.

The 'taking at its leisure' comes by being virtue of an Acceptor.
'Producing at its leisure' is a feature already available in Python.
Its called a Generator.

Behold:

# An Acceptor/Generator!!!
def combineIntoRecords():
       optionalline = None # We may not get given a value for this line
       accept firstline
       accept secondline
       if condition(secondline):
          accept optionalline
       accept lastline
       yield createRecord(firstline, secondline, optionalline,
lastline)

def getData(lines):
	return list(combineIntoRecords(filterJunk(lines)))

So, alas, my beloved for loop was not saved after all.

Fortunately, the dreaded While loop has been vanquished. In fact all
loopiness has disappeared. Along with any other semblance of a main
method.

I think I like the finished result. I'll leave if for you to decide if
you like it too.

Recent Messages in this Thread
Jordan Rastrick Mar 25, 2005 04:04 pm
Jordan Rastrick Mar 25, 2005 04:10 pm
Andrew Koenig Mar 25, 2005 04:18 pm
Jordan Rastrick Mar 25, 2005 04:36 pm
Andrew Koenig Mar 26, 2005 01:40 pm
Tim Hochberg Mar 25, 2005 04:25 pm
Michael Spencer Mar 25, 2005 04:46 pm
Jordan Rastrick Mar 25, 2005 05:23 pm
Michael Spencer Mar 25, 2005 06:41 pm
Serge Orlov Mar 25, 2005 07:14 pm
Jordan Rastrick Mar 26, 2005 06:56 am
Peter Otten Mar 26, 2005 08:04 am
Scott David Daniels Mar 25, 2005 07:34 pm
Michael Spencer Mar 25, 2005 08:07 pm
Scott David Daniels Mar 25, 2005 09:00 pm
Bengt Richter Mar 25, 2005 10:11 pm
Bengt Richter Mar 25, 2005 07:43 pm
Terry Reedy Mar 25, 2005 07:13 pm
Jordan Rastrick Mar 25, 2005 04:49 pm
Diez B. Roggisch Mar 25, 2005 06:13 pm
Jordan Rastrick Mar 26, 2005 04:11 pm
phil...@yahoo.com Mar 27, 2005 03:44 am
Oren Tirosh Mar 27, 2005 06:58 am
Messages in this thread