Welcome, guest | Sign In | My Account | Store | Cart

Closures are powerful. Closures are beautiful. But: Closures are TRICKY!

This is an anti-recipe, a caveat about some obscure pitfalls with closures - or the way they are implemented in python.

And now for the caveats: Two no-no's...

  1. Don't create more then one instance of the same closure per normal function!

  2. Don't create more then one instance of the same closure per generation cycle in a generator function!

Here is why...

Python, 42 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
'''
According to my expectations, the three functions in this code below should all
produce the same result. I've been stunned to discover that this is not the case.
'''

def test1():
   for i in range(5):
      def call(): return i
      yield call

def test2():
   all = []
   for i in range(5):
      def call(): return i
      all.append(call)
   return all

def test3():
   def MakeCall(i):
      def call(): return i
      return call

   all = []
   for i in range(5):
      all.append(MakeCall(i))
   return all


print
for test in [ test1, test2, test3 ]:
   print test.__name__, ':', [ f() for f in test() ]

expected_output = '''
test1 : [0, 1, 2, 3, 4]
test2 : [0, 1, 2, 3, 4]
test3 : [0, 1, 2, 3, 4]
'''
actual_output = '''
test1 : [0, 1, 2, 3, 4]
test2 : [4, 4, 4, 4, 4] # <= this is the stunning thing !!!
test3 : [0, 1, 2, 3, 4]
'''

I've been using closures in all kinds of situations, and until recently, they always used to behave in line with my expectations. The above result really stunned me... What the hack was going on here??? It seems that all closure instances constructed within test2 are closing on the value of i as it is at the end of the loop, regardless of their construction time and context. Now THAT was definitely NOT what I expected...

According to my understanding of closures, I would expect that they are bound to a kind of snapshot of their lexical environment at the moment they are defined. I.e., for the code below I would expect f(3) to return a function (a closure) that always returns 7. <pre> def f(x): x2 = 2 * x + 1 def g(): return x return g </pre> This seems quite natural to me, as this is in congruence with my general understanding of how interpreters work:

Each time the interpreter executes f, and encounters def g, it creates a new code block by compiling the body of g and binds it to the local name 'g'. At the same time, since the body of g references x2, the value of x2 is not discarded by the garbage collector even after the scope of f is left. It remains intact as long as g - the only instance to reference it - lives. So far, closures seem a natural consequence of the principal code interpretation and compilation process.

However, it turned out - and I learned it the hard way - that things are not that simple.

As it seems, a closure is not bound to a snapshot of its lexical environment at the moment it is defined, but rather at the end of the defining function. Or, to be more precise: At points where the surrounding lexical scope is left. This includes return and yield statements. Hence, wrapping up the construction of the closure in a "closure factory function", as exemplified in test3, retreats to the expected outcome again.

I have no clue whether this behavior is by definition or a result of an over-optimization. I incline to the latter. But be it intentional or not, the matter is worth rethinking, because one thing is certain:

If wrapping up a piece of code into a function - a broadly used refactoring known as "Extract Method" - modifies the behavior of the code, then (that) refactoring is not behavior-preserving any more. That is, it is not a refactoring at all...

Well, I don't know about you, but I don't like the taste of that.

Cheers and happy pitfall avoiding - to all of us.

12 comments

Petr Solodov 17 years, 2 months ago  # | flag

I don't think it's a problem with closures in this case, but rather with how variables are defined in loops. In the first example a closure is created when asked, so value of i is used at this time. In the second one, however i is not created in the function scope, but rather referenced, so at the end of the loop value is 4, which would explain your result. In the third example again, new i is created for the scope of the closure and things are back to normal, here's an example:

def test2():
   all = []
   for i in range(5):
      def call(): return id(i)
      all.append(call)
   print id(i)
   return all

print test2.__name__, ':', [ f() for f in test2() ]

id of i is the same everywhere. It's still the same in your example one, but what makes it work is the fact that it's a generator.

Zoran Isailovski (author) 17 years, 2 months ago  # | flag

Really? Changing "return i" to return "return id(i)" produced this output (of course the id's are not reproducible):

test1 : [3104596, 3104584, 3104572, 3104560, 3104548]
test2 : [3104548, 3104548, 3104548, 3104548, 3104548]
test3 : [3104596, 3104584, 3104572, 3104560, 3104548]

At the same time, printing id(i) gives me different id's in each loop cycle, in all cases.

So it is NOT the same for cases 1 and 3, the id is constant only in case 2.

I don't think that variables are defined in loops differently then outside of loops. For example,

def test2a():
   all = []
   i = 0
   def call(): return id(i)
   all.append(call)
   i += 1
   def call(): return id(i)
   all.append(call)
   # ... etc.
   return all

behaves equally strange.

Anyway, the fact that the id does not change in case 2 supports my theory that python tries to bind closures to surrounding scope at the latest possible moment.

In the illustrated case (2), I think it's TOO LATE. I don't think that wrapping up some code in a function, like I did in case 3, should make so much difference. It is absolutely counter-intuitive (if not worse).

Petr Solodov 17 years, 2 months ago  # | flag

The scoping rules in python are somewhat weird, I still think this is the cause of the problem, not how closures are made.

for i in range(10):
  print i
# i is still defined over here
print i

same happens with list comprehensions. So with that in mind, it's more or less clear what happens in your examples, one and two don't define a scope where i is a separate instance. ...though I could be mistaken...

Jordan Callicoat 17 years, 2 months ago  # | flag

I think Petr is correct. When i is evaluated in test2(), it's already at the end of the loop, and i=4. However, this one works as expected, as it capture i on each iteration:

def test2():
   all = []
   for i in range(5):
      def call(j): return j
      all.append((call, i))
   return all
print test2.__name__, ':', [ f(a) for f,a in test2() ]
Zoran Isailovski (author) 17 years, 2 months ago  # | flag

Hmm... I'm pretty much out of horizontal space here... continuing as extra comment below...

Zoran Isailovski (author) 17 years, 2 months ago  # | flag

Hmm... (continued from response to "I think Petr is correct"). Jordan, please, look more carefully! When i is evaluated, it is NOT IN test2 AT ALL. It is OUTSIDE of test2, where i is not available at all! test2 simply returns the closures, but they are evaluated somewhere else. That's the benefit of closures.

Besides, in your code, 'call' is not a closure at all. You are not adding closures to the sequence, but tuples of simple function adresses and argumens, and INTERPRET the tuples afterwards.

The specific of a closure is that is is bound to values of it's surrounding environment - it is "closed" on those values (EVEN AFTER that environment gets out of scope, as is the case with local variables when the function exits).

The question is: WHEN is it closed?

My "anti-recipe" expresses the assumtion that closures in python are "closed" at the moment when the surrounding scope is left, not at the moment the closure is encontered (i.e. defined). It also expresses my (strong) belief that closures schould be closed when defined instead.

After all that's said, my assumption still holds. Whether or not my belief is correct... well, that's best left to history to judge ;)

Zoran Isailovski (author) 17 years, 2 months ago  # | flag

Petr, is it really clear? A loop does not establish a separate scope in python, so it is only natural that i is visible after the loop. Same is true about list comprehensions. The interpreter encounters an assignement to i - it defines the variable. It sees the end of the loop - it just ends the loop. It does not imlicitly insert a "del i" at the end of the loop to undefine it. And fortunately so.

All in all, I think scoping rules have definitely not something to do with the phenomenon I've described in the "anti-recipe".

Jordan Callicoat 17 years, 2 months ago  # | flag

You're right... I realize that my example wasn't a "closure" (more of a functor pattern, though lispers might consider it a type of closure passing). I was only trying to illustrate Petr's comment that in that particular closure pattern in test2(), the context capture occurs after the loop, and so i=4. I probably just muddied the water rather than clarifying. Sorry about that.

Mike Klaas 17 years ago  # | flag

Misunderstanding. Closures are over _local function variables_, never ever values. Even test1() "fails" if you wait for the function to end before calling the closures:

def test1(): for i in range(5): yield lambda: i

In [14]: [f() for f in test1()] Out[14]: [0, 1, 2, 3, 4] In [15]: map(apply, [f for f in test1()]) Out[15]: [4, 4, 4, 4, 4]

The functions all reference the same variable, the variable is just bound to different objects at when you are calling the closures.

The best way to create such functions is to use the default argument trick:

def test4(): for i in range(5): yield lambda i=i: i

In [19]: [f() for f in test4()] Out[19]: [0, 1, 2, 3, 4] In [20]: map(apply, [f for f in test4()]) Out[20]: [0, 1, 2, 3, 4]

Devon Sean McCullough 12 years, 4 months ago  # | flag

Mike Klaas gets it, a closure is NOT an immutable snapshot, just as a variable assignment is NOT a mathematical equation.

These Nasty Variables - Caveats for the Variable Enthusiast (Python recipe)

Variables are powerful. Variables are beautiful. But: Variables are TRICKY!

This is an anti-recipe, a caveat about some obscure pitfalls with variables - or the way they are implemented in python.

And now for the caveats: Two no-no's...

  1. Don't assign more then one value to the same variable per normal function!

  2. Don't assign more then one value to the same variable per generation cycle in a generator function!

Here is why...

Zoran Isailovski (author) 12 years, 4 months ago  # | flag

It is still unexplained why the 3 functions should yield different results (not why they do yield them).

My premise was, the 3 functions are semantically equivalent, hence they should yield the same results. If they are not, then the underlying programming language implementation that makes them behave differently is wrong.

Since the above logical conclusion is perfectly beyond questioning, the only reasonable form of counter-argument is to question my premise. And I have yet to read a plausible and consistent argument why the 3 functions are not semantically equivalent.

Mike Klaas comes close, but IMO his explanation doesn't cover test3.

I included test3 to test variable vs value binding.) In test3, the local function MakeCall is declared once and reused within the loop. According to Mike's theory, the function call is hence bound to the same variable - MakeCall's argument i. The fact that it is bound to different values with each call should not matter then, so test3 should behave like test2. But it doesn't! It behaves like as if call in test3 was closed over the value of i, not the variable.

In fact, the whole scenario in my "anti-recipe" is about testing that hypothesis: Does a closure close over variable or values? But while test2 indicates closure over variables, like Mike suggests, test3 indicates closer over values. I found that confusing.

That's at least how I remember my stream of thoughts after all these years. It would be still nice to have a crystal-clear guideline on closures, or even better a language construct that doesn't create confusion and seemingly contradicting behaviors in the 1st place, but oh well.

@Devon Sean McCullough: Sarcasm is the lowest form of wit!

Jack O'Connor 10 years, 2 months ago  # | flag

The answer is that your three functions aren't semantically equivalent at all. There are two rather subtle reasons why.

test1 vs test2

As Mike Klaas pointed out, closures happen over variables, not values. Another way of saying that is, Python never ever takes a snapshot of your variables when you make a closure.

That means that in both test1 and test2, your inner call() function is closing on a variable i whose value can and will change. So the value of i depends on when you invoke call(). In test1, because you yield out in the middle of your for loop, call() gets invoked before the value of i changes. In test2, because you finish the for loop first, all your invocations of call() see the final value of i.

If you put some print statements in call() and in your for loops, you will see the clear difference in the order that everything is happening in these two tests.

test2 vs test3

Function definitions don't snapshot any variables for you, but function calls do, sort of. When a function is called, it creates local variables to hold its arguments. You can clearly see this happening if you use different variable names in the definition and the call. If you have "def f(foo):" and you call "f(bar)", the value of bar gets copied into the local variable foo when f is called. Reassigning bar later won't have any effect on foo. Also, calling f again while f is running won't affect foo either, because each call gets its own separate local scope. (But see the CAVEAT below...)

Confusingly, in test3, you have an inner variable (the argument to MakeCall) and outer variable (in your for loop) that are both called i. But they are different variables! The inner i is assigned its value when MakeCall is invoked, and it lives in the function call scope. Because you return a closure from that scope, the inner i is preserved, and the outer i is not visible to your closure.

To make it clear what's going on, change the name of the argument in the declaration of MakeCall. Name it x or something. Now if you still have call() return the variable i, it should be clear that it's returning a variable from an outer scope, whose value is changing, and you will start seeing the same unexpected results from test2.

This is actually a common technique when you need to create closures in a for loop. You can have your function take an argument whose default value is the variable you're closing on, and that will make a copy of it at the time the function is defined.

CAVEAT: If foo and bar are referring to a mutable object like a list, changes to the contents of that object will be reflected in both. It's only reassignment that function arguments are protected from.

Created by Zoran Isailovski on Sat, 3 Mar 2007 (PSF)
Python recipes (4591)
Zoran Isailovski's recipes (13)
HongxuChen's Fav (39)

Required Modules

  • (none specified)

Other Information and Tasks