Welcome, guest | Sign In | My Account | Store | Cart

Often, when working with a dictionary D, you need to use the entry D[k] if it's already present, or add a new D[k] if k wasn't a key into D. The setdefault method of dictionaries is a very handy shortcut for this task.

Python, 22 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# say we're building a word->pagenumbers index -- a key piece of code might be:

theIndex = {}
def addword(word, pagenumber):
    if theIndex.has_key(word):
        theIndex[word].append(pagenumber)
    else:
        theIndex[word] = [pagenumber]

# incidentally, a good Pythonic instinct would be to substitute this
# "look before you leap" pattern with a "easier to get permission":

def addword(word, pagenumber):
    try: theIndex[word].append(pagenumber)
    except AttributeError: theIndex[word] = [pagenumber]

# but this is by the by -- just a minor simplification.  However,
# this meets the pattern "use the entry if already present, else
# add a new entry".  Here's how using setdefault simplifies this:

def addword(word, pagenumber):
    theIndex.setdefault(word,[]).append(pagenumber)

Basically, dict.setdefault(k,v) is much like dict.get(k,v), except that, if not dict.haskey(k), the setdefault methods assign dict]=v as well as returning v (while get would just return v, without affecting dict in any way). Therefore, setdefault is appropriate any time you have get-like needs but also need to have this side-effect on the dictionary.

setdefault is particularly useful for the very common data structure that is a dictionary whose values are lists, and the single most typical usage form for it is somedict.setdefault(somekey,[]).append(somevalue).

Note that setdefault is normally not useful if the values are immutable. If you just want to count words, for example, theIndex.setdefault(word,1) is not very useful -- rather, use theIndex[word] = 1 + theIndex.get(word,0).

5 comments

Martin Miller 21 years, 3 months ago  # | flag

Better? I'm curious to hear how this technique compares to another common idiom I've seen, which for the example would be:

def addword(word, pagenumber):
    theIndex[word] = theIndex.get(word, []) + [pagenumber]

TIA

Matthew Shomphe 20 years, 11 months ago  # | flag

I had the same question.... So I ran the following code. I have yet to get a result for dict_2(). It was running for far too long. dict_1() takes about .7 seconds.

# test.py
WORDS = ['this', 'that', 'other']
def dict_1():
    d = {}
    for x in xrange(0, 100000):
        word = WORDS[x%3]
        d.setdefault(word, []).append(x)
def dict_2():
    d = {}
    for y in xrange(0, 100000):
        word2 = WORDS[y%3]
        d[word2] = d.setdefault(word2, []) + [y]

if __name__ == '__main__':
    import profile
    profile.run('dict_1()')
    profile.run('dict_2()')
Matthew Shomphe 20 years, 11 months ago  # | flag

Some specifics. Here are some specifics on the test I ran above:

      3 function calls in 0.752 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.038    0.038    0.690    0.690 :1(?)
     1    0.653    0.653    0.653    0.653 dictTest.py:2(dict_1)
     1    0.061    0.061    0.752    0.752 profile:0(dict_1())
     0    0.000             0.000          profile:0(profiler)

======================================================================

      3 function calls in 551.600 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.026    0.026  551.518  551.518 :1(?)
     1  551.492  551.492  551.492  551.492 dictTest.py:7(dict_2)
     1    0.082    0.082  551.600  551.600 profile:0(dict_2())
     0    0.000             0.000          profile:0(profiler)

As you can see, there's a massive difference.

My system:

Python 2.3b1

Windows 2K

PIII 1.13 GHz

buck 14 years, 5 months ago  # | flag

Does the [] get evaluated every time the function is called, thus creating a new list that's passed to the func- tion? Or is there some optimization to keep a new list from being created (or from it costing too much at the implementation level, like copy-on-write) if unneeded? Cause if you use a constructor call, like in the below, there's definitely no magic shortcut happening

class noisy_list_wrapper(list):
    invoked = 0
    def __init__(self):
        self.__class__.invoked += 1
        super(self.__class__, self).__init__(self)

WORDS = ['this', 'that', 'other']

def dict_1():
    d = {}
    for x in xrange(0, 100000):
        word = WORDS[x%3]
        d.setdefault(word, noisy_list_wrapper()).append(x)

dict_1()
print noisy_list_wrapper.invoked

===> 100000
thefourtheye 10 years, 2 months ago  # | flag

Latest versions of Python have collections.defaultdict for this very purpose

Created by Alex Martelli on Tue, 14 Aug 2001 (PSF)
Python recipes (4591)
Alex Martelli's recipes (27)

Required Modules

  • (none specified)

Other Information and Tasks