Closures are powerful. Closures are beautiful. But: Closures are TRICKY!
This is an anti-recipe, a caveat about some obscure pitfalls with closures - or the way they are implemented in python.
And now for the caveats: Two no-no's...
Don't create more then one instance of the same closure per normal function!
Don't create more then one instance of the same closure per generation cycle in a generator function!
Here is why...
| Python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | '''
According to my expectations, the three functions in this code below should all
produce the same result. I've been stunned to discover that this is not the case.
'''
def test1():
for i in range(5):
def call(): return i
yield call
def test2():
all = []
for i in range(5):
def call(): return i
all.append(call)
return all
def test3():
def MakeCall(i):
def call(): return i
return call
all = []
for i in range(5):
all.append(MakeCall(i))
return all
print
for test in [ test1, test2, test3 ]:
print test.__name__, ':', [ f() for f in test() ]
expected_output = '''
test1 : [0, 1, 2, 3, 4]
test2 : [0, 1, 2, 3, 4]
test3 : [0, 1, 2, 3, 4]
'''
actual_output = '''
test1 : [0, 1, 2, 3, 4]
test2 : [4, 4, 4, 4, 4] # <= this is the stunning thing !!!
test3 : [0, 1, 2, 3, 4]
'''
|
Discussion
I've been using closures in all kinds of situations, and until recently, they always used to behave in line with my expectations. The above result really stunned me... What the hack was going on here??? It seems that all closure instances constructed within test2 are closing on the value of i as it is at the end of the loop, regardless of their construction time and context. Now THAT was definitely NOT what I expected...
According to my understanding of closures, I would expect that they are bound to a kind of snapshot of their lexical environment at the moment they are defined. I.e., for the code below I would expect f(3) to return a function (a closure) that always returns 7. <pre> def f(x): x2 = 2 * x + 1 def g(): return x return g </pre> This seems quite natural to me, as this is in congruence with my general understanding of how interpreters work:
Each time the interpreter executes f, and encounters def g, it creates a new code block by compiling the body of g and binds it to the local name 'g'. At the same time, since the body of g references x2, the value of x2 is not discarded by the garbage collector even after the scope of f is left. It remains intact as long as g - the only instance to reference it - lives. So far, closures seem a natural consequence of the principal code interpretation and compilation process.
However, it turned out - and I learned it the hard way - that things are not that simple.
As it seems, a closure is not bound to a snapshot of its lexical environment at the moment it is defined, but rather at the end of the defining function. Or, to be more precise: At points where the surrounding lexical scope is left. This includes return and yield statements. Hence, wrapping up the construction of the closure in a "closure factory function", as exemplified in test3, retreats to the expected outcome again.
I have no clue whether this behavior is by definition or a result of an over-optimization. I incline to the latter. But be it intentional or not, the matter is worth rethinking, because one thing is certain:
If wrapping up a piece of code into a function - a broadly used refactoring known as "Extract Method" - modifies the behavior of the code, then (that) refactoring is not behavior-preserving any more. That is, it is not a refactoring at all...
Well, I don't know about you, but I don't like the taste of that.
Cheers and happy pitfall avoiding - to all of us.


Comments
I don't think it's a problem with closures in this case, but rather with how variables are defined in loops. In the first example a closure is created when asked, so value of i is used at this time. In the second one, however i is not created in the function scope, but rather referenced, so at the end of the loop value is 4, which would explain your result. In the third example again, new i is created for the scope of the closure and things are back to normal, here's an example:
id of i is the same everywhere. It's still the same in your example one, but what makes it work is the fact that it's a generator.
Really? Changing "return i" to return "return id(i)" produced this output (of course the id's are not reproducible):
At the same time, printing id(i) gives me different id's in each loop cycle, in all cases.
So it is NOT the same for cases 1 and 3, the id is constant only in case 2.
I don't think that variables are defined in loops differently then outside of loops. For example,
behaves equally strange.
Anyway, the fact that the id does not change in case 2 supports my theory that python tries to bind closures to surrounding scope at the latest possible moment.
In the illustrated case (2), I think it's TOO LATE. I don't think that wrapping up some code in a function, like I did in case 3, should make so much difference. It is absolutely counter-intuitive (if not worse).
The scoping rules in python are somewhat weird, I still think this is the cause of the problem, not how closures are made.
same happens with list comprehensions. So with that in mind, it's more or less clear what happens in your examples, one and two don't define a scope where i is a separate instance. ...though I could be mistaken...
I think Petr is correct. When i is evaluated in test2(), it's already at the end of the loop, and i=4. However, this one works as expected, as it capture i on each iteration:
Hmm... I'm pretty much out of horizontal space here... continuing as extra comment below...
Hmm... (continued from response to "I think Petr is correct"). Jordan, please, look more carefully! When i is evaluated, it is NOT IN test2 AT ALL. It is OUTSIDE of test2, where i is not available at all! test2 simply returns the closures, but they are evaluated somewhere else. That's the benefit of closures.
Besides, in your code, 'call' is not a closure at all. You are not adding closures to the sequence, but tuples of simple function adresses and argumens, and INTERPRET the tuples afterwards.
The specific of a closure is that is is bound to values of it's surrounding environment - it is "closed" on those values (EVEN AFTER that environment gets out of scope, as is the case with local variables when the function exits).
The question is: WHEN is it closed?
My "anti-recipe" expresses the assumtion that closures in python are "closed" at the moment when the surrounding scope is left, not at the moment the closure is encontered (i.e. defined). It also expresses my (strong) belief that closures schould be closed when defined instead.
After all that's said, my assumption still holds. Whether or not my belief is correct... well, that's best left to history to judge ;)
Petr, is it really clear? A loop does not establish a separate scope in python, so it is only natural that i is visible after the loop. Same is true about list comprehensions. The interpreter encounters an assignement to i - it defines the variable. It sees the end of the loop - it just ends the loop. It does not imlicitly insert a "del i" at the end of the loop to undefine it. And fortunately so.
All in all, I think scoping rules have definitely not something to do with the phenomenon I've described in the "anti-recipe".
You're right... I realize that my example wasn't a "closure" (more of a functor pattern, though lispers might consider it a type of closure passing). I was only trying to illustrate Petr's comment that in that particular closure pattern in test2(), the context capture occurs after the loop, and so i=4. I probably just muddied the water rather than clarifying. Sorry about that.
Misunderstanding. Closures are over _local function variables_, never ever values. Even test1() "fails" if you wait for the function to end before calling the closures:
def test1(): for i in range(5): yield lambda: i
In [14]: [f() for f in test1()] Out[14]: [0, 1, 2, 3, 4] In [15]: map(apply, [f for f in test1()]) Out[15]: [4, 4, 4, 4, 4]
The functions all reference the same variable, the variable is just bound to different objects at when you are calling the closures.
The best way to create such functions is to use the default argument trick:
def test4(): for i in range(5): yield lambda i=i: i
In [19]: [f() for f in test4()] Out[19]: [0, 1, 2, 3, 4] In [20]: map(apply, [f for f in test4()]) Out[20]: [0, 1, 2, 3, 4]
Sign in to comment