Closures are powerful. Closures are beautiful. But: Closures are TRICKY!
This is an anti-recipe, a caveat about some obscure pitfalls with closures - or the way they are implemented in python.
And now for the caveats: Two no-no's...
Don't create more then one instance of the same closure per normal function!
Don't create more then one instance of the same closure per generation cycle in a generator function!
Here is why...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | '''
According to my expectations, the three functions in this code below should all
produce the same result. I've been stunned to discover that this is not the case.
'''
def test1():
for i in range(5):
def call(): return i
yield call
def test2():
all = []
for i in range(5):
def call(): return i
all.append(call)
return all
def test3():
def MakeCall(i):
def call(): return i
return call
all = []
for i in range(5):
all.append(MakeCall(i))
return all
print
for test in [ test1, test2, test3 ]:
print test.__name__, ':', [ f() for f in test() ]
expected_output = '''
test1 : [0, 1, 2, 3, 4]
test2 : [0, 1, 2, 3, 4]
test3 : [0, 1, 2, 3, 4]
'''
actual_output = '''
test1 : [0, 1, 2, 3, 4]
test2 : [4, 4, 4, 4, 4] # <= this is the stunning thing !!!
test3 : [0, 1, 2, 3, 4]
'''
|
I've been using closures in all kinds of situations, and until recently, they always used to behave in line with my expectations. The above result really stunned me... What the hack was going on here??? It seems that all closure instances constructed within test2 are closing on the value of i as it is at the end of the loop, regardless of their construction time and context. Now THAT was definitely NOT what I expected...
According to my understanding of closures, I would expect that they are bound to a kind of snapshot of their lexical environment at the moment they are defined. I.e., for the code below I would expect f(3) to return a function (a closure) that always returns 7. <pre> def f(x): x2 = 2 * x + 1 def g(): return x return g </pre> This seems quite natural to me, as this is in congruence with my general understanding of how interpreters work:
Each time the interpreter executes f, and encounters def g, it creates a new code block by compiling the body of g and binds it to the local name 'g'. At the same time, since the body of g references x2, the value of x2 is not discarded by the garbage collector even after the scope of f is left. It remains intact as long as g - the only instance to reference it - lives. So far, closures seem a natural consequence of the principal code interpretation and compilation process.
However, it turned out - and I learned it the hard way - that things are not that simple.
As it seems, a closure is not bound to a snapshot of its lexical environment at the moment it is defined, but rather at the end of the defining function. Or, to be more precise: At points where the surrounding lexical scope is left. This includes return and yield statements. Hence, wrapping up the construction of the closure in a "closure factory function", as exemplified in test3, retreats to the expected outcome again.
I have no clue whether this behavior is by definition or a result of an over-optimization. I incline to the latter. But be it intentional or not, the matter is worth rethinking, because one thing is certain:
If wrapping up a piece of code into a function - a broadly used refactoring known as "Extract Method" - modifies the behavior of the code, then (that) refactoring is not behavior-preserving any more. That is, it is not a refactoring at all...
Well, I don't know about you, but I don't like the taste of that.
Cheers and happy pitfall avoiding - to all of us.
I don't think it's a problem with closures in this case, but rather with how variables are defined in loops. In the first example a closure is created when asked, so value of i is used at this time. In the second one, however i is not created in the function scope, but rather referenced, so at the end of the loop value is 4, which would explain your result. In the third example again, new i is created for the scope of the closure and things are back to normal, here's an example:
id of i is the same everywhere. It's still the same in your example one, but what makes it work is the fact that it's a generator.
Really? Changing "return i" to return "return id(i)" produced this output (of course the id's are not reproducible):
At the same time, printing id(i) gives me different id's in each loop cycle, in all cases.
So it is NOT the same for cases 1 and 3, the id is constant only in case 2.
I don't think that variables are defined in loops differently then outside of loops. For example,
behaves equally strange.
Anyway, the fact that the id does not change in case 2 supports my theory that python tries to bind closures to surrounding scope at the latest possible moment.
In the illustrated case (2), I think it's TOO LATE. I don't think that wrapping up some code in a function, like I did in case 3, should make so much difference. It is absolutely counter-intuitive (if not worse).
The scoping rules in python are somewhat weird, I still think this is the cause of the problem, not how closures are made.
same happens with list comprehensions. So with that in mind, it's more or less clear what happens in your examples, one and two don't define a scope where i is a separate instance. ...though I could be mistaken...
I think Petr is correct. When i is evaluated in test2(), it's already at the end of the loop, and i=4. However, this one works as expected, as it capture i on each iteration:
Hmm... I'm pretty much out of horizontal space here... continuing as extra comment below...
Hmm... (continued from response to "I think Petr is correct"). Jordan, please, look more carefully! When i is evaluated, it is NOT IN test2 AT ALL. It is OUTSIDE of test2, where i is not available at all! test2 simply returns the closures, but they are evaluated somewhere else. That's the benefit of closures.
Besides, in your code, 'call' is not a closure at all. You are not adding closures to the sequence, but tuples of simple function adresses and argumens, and INTERPRET the tuples afterwards.
The specific of a closure is that is is bound to values of it's surrounding environment - it is "closed" on those values (EVEN AFTER that environment gets out of scope, as is the case with local variables when the function exits).
The question is: WHEN is it closed?
My "anti-recipe" expresses the assumtion that closures in python are "closed" at the moment when the surrounding scope is left, not at the moment the closure is encontered (i.e. defined). It also expresses my (strong) belief that closures schould be closed when defined instead.
After all that's said, my assumption still holds. Whether or not my belief is correct... well, that's best left to history to judge ;)
Petr, is it really clear? A loop does not establish a separate scope in python, so it is only natural that i is visible after the loop. Same is true about list comprehensions. The interpreter encounters an assignement to i - it defines the variable. It sees the end of the loop - it just ends the loop. It does not imlicitly insert a "del i" at the end of the loop to undefine it. And fortunately so.
All in all, I think scoping rules have definitely not something to do with the phenomenon I've described in the "anti-recipe".
You're right... I realize that my example wasn't a "closure" (more of a functor pattern, though lispers might consider it a type of closure passing). I was only trying to illustrate Petr's comment that in that particular closure pattern in test2(), the context capture occurs after the loop, and so i=4. I probably just muddied the water rather than clarifying. Sorry about that.
Misunderstanding. Closures are over _local function variables_, never ever values. Even test1() "fails" if you wait for the function to end before calling the closures:
def test1(): for i in range(5): yield lambda: i
In [14]: [f() for f in test1()] Out[14]: [0, 1, 2, 3, 4] In [15]: map(apply, [f for f in test1()]) Out[15]: [4, 4, 4, 4, 4]
The functions all reference the same variable, the variable is just bound to different objects at when you are calling the closures.
The best way to create such functions is to use the default argument trick:
def test4(): for i in range(5): yield lambda i=i: i
In [19]: [f() for f in test4()] Out[19]: [0, 1, 2, 3, 4] In [20]: map(apply, [f for f in test4()]) Out[20]: [0, 1, 2, 3, 4]
Mike Klaas gets it, a closure is NOT an immutable snapshot, just as a variable assignment is NOT a mathematical equation.
These Nasty Variables - Caveats for the Variable Enthusiast (Python recipe)
Variables are powerful. Variables are beautiful. But: Variables are TRICKY!
This is an anti-recipe, a caveat about some obscure pitfalls with variables - or the way they are implemented in python.
And now for the caveats: Two no-no's...
Don't assign more then one value to the same variable per normal function!
Don't assign more then one value to the same variable per generation cycle in a generator function!
Here is why...
It is still unexplained why the 3 functions should yield different results (not why they do yield them).
My premise was, the 3 functions are semantically equivalent, hence they should yield the same results. If they are not, then the underlying programming language implementation that makes them behave differently is wrong.
Since the above logical conclusion is perfectly beyond questioning, the only reasonable form of counter-argument is to question my premise. And I have yet to read a plausible and consistent argument why the 3 functions are not semantically equivalent.
Mike Klaas comes close, but IMO his explanation doesn't cover
test3
.I included test3 to test variable vs value binding.) In
test3
, the local functionMakeCall
is declared once and reused within the loop. According to Mike's theory, the function call is hence bound to the same variable -MakeCall
's argumenti
. The fact that it is bound to different values with each call should not matter then, sotest3
should behave liketest2
. But it doesn't! It behaves like as ifcall
in test3 was closed over the value ofi
, not the variable.In fact, the whole scenario in my "anti-recipe" is about testing that hypothesis: Does a closure close over variable or values? But while
test2
indicates closure over variables, like Mike suggests,test3
indicates closer over values. I found that confusing.That's at least how I remember my stream of thoughts after all these years. It would be still nice to have a crystal-clear guideline on closures, or even better a language construct that doesn't create confusion and seemingly contradicting behaviors in the 1st place, but oh well.
@Devon Sean McCullough: Sarcasm is the lowest form of wit!
The answer is that your three functions aren't semantically equivalent at all. There are two rather subtle reasons why.
test1 vs test2
As Mike Klaas pointed out, closures happen over variables, not values. Another way of saying that is, Python never ever takes a snapshot of your variables when you make a closure.
That means that in both test1 and test2, your inner call() function is closing on a variable i whose value can and will change. So the value of i depends on when you invoke call(). In test1, because you yield out in the middle of your for loop, call() gets invoked before the value of i changes. In test2, because you finish the for loop first, all your invocations of call() see the final value of i.
If you put some print statements in call() and in your for loops, you will see the clear difference in the order that everything is happening in these two tests.
test2 vs test3
Function definitions don't snapshot any variables for you, but function calls do, sort of. When a function is called, it creates local variables to hold its arguments. You can clearly see this happening if you use different variable names in the definition and the call. If you have "def f(foo):" and you call "f(bar)", the value of bar gets copied into the local variable foo when f is called. Reassigning bar later won't have any effect on foo. Also, calling f again while f is running won't affect foo either, because each call gets its own separate local scope. (But see the CAVEAT below...)
Confusingly, in test3, you have an inner variable (the argument to MakeCall) and outer variable (in your for loop) that are both called i. But they are different variables! The inner i is assigned its value when MakeCall is invoked, and it lives in the function call scope. Because you return a closure from that scope, the inner i is preserved, and the outer i is not visible to your closure.
To make it clear what's going on, change the name of the argument in the declaration of MakeCall. Name it x or something. Now if you still have call() return the variable i, it should be clear that it's returning a variable from an outer scope, whose value is changing, and you will start seeing the same unexpected results from test2.
This is actually a common technique when you need to create closures in a for loop. You can have your function take an argument whose default value is the variable you're closing on, and that will make a copy of it at the time the function is defined.
CAVEAT: If foo and bar are referring to a mutable object like a list, changes to the contents of that object will be reflected in both. It's only reassignment that function arguments are protected from.