ActiveState Code

Recipe 66516: Add an entry to a dictionary, unless the entry is already there


Often, when working with a dictionary D, you need to use the entry D[k] if it's already present, or add a new D[k] if k wasn't a key into D. The setdefault method of dictionaries is a very handy shortcut for this task.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# say we're building a word->pagenumbers index -- a key piece of code might be:

theIndex = {}
def addword(word, pagenumber):
    if theIndex.has_key(word):
        theIndex[word].append(pagenumber)
    else:
        theIndex[word] = [pagenumber]

# incidentally, a good Pythonic instinct would be to substitute this
# "look before you leap" pattern with a "easier to get permission":

def addword(word, pagenumber):
    try: theIndex[word].append(pagenumber)
    except AttributeError: theIndex[word] = [pagenumber]

# but this is by the by -- just a minor simplification.  However,
# this meets the pattern "use the entry if already present, else
# add a new entry".  Here's how using setdefault simplifies this:

def addword(word, pagenumber):
    theIndex.setdefault(word,[]).append(pagenumber)

Discussion

Basically, dict.setdefault(k,v) is much like dict.get(k,v), except that, if not dict.haskey(k), the setdefault methods assign dict]=v as well as returning v (while get would just return v, without affecting dict in any way). Therefore, setdefault is appropriate any time you have get-like needs but also need to have this side-effect on the dictionary.

setdefault is particularly useful for the very common data structure that is a dictionary whose values are lists, and the single most typical usage form for it is somedict.setdefault(somekey,[]).append(somevalue).

Note that setdefault is normally not useful if the values are immutable. If you just want to count words, for example, theIndex.setdefault(word,1) is not very useful -- rather, use theIndex[word] = 1 + theIndex.get(word,0).

Comments

  1. 1. At 5:11 p.m. on 27 jan 2003, Martin Miller said:

    Better? I'm curious to hear how this technique compares to another common idiom I've seen, which for the example would be:

    def addword(word, pagenumber):
        theIndex[word] = theIndex.get(word, []) + [pagenumber]
    

    TIA

  2. 2. At 5:13 p.m. on 20 may 2003, Matthew Shomphe said:

    I had the same question.... So I ran the following code. I have yet to get a result for dict_2(). It was running for far too long. dict_1() takes about .7 seconds.

    # test.py
    WORDS = ['this', 'that', 'other']
    def dict_1():
        d = {}
        for x in xrange(0, 100000):
            word = WORDS[x%3]
            d.setdefault(word, []).append(x)
    def dict_2():
        d = {}
        for y in xrange(0, 100000):
            word2 = WORDS[y%3]
            d[word2] = d.setdefault(word2, []) + [y]
    
    if __name__ == '__main__':
        import profile
        profile.run('dict_1()')
        profile.run('dict_2()')
    
  3. 3. At 5:34 p.m. on 20 may 2003, Matthew Shomphe said:

    Some specifics. Here are some specifics on the test I ran above:

          3 function calls in 0.752 CPU seconds
    
    Ordered by: standard name
    
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.038    0.038    0.690    0.690 :1(?)
         1    0.653    0.653    0.653    0.653 dictTest.py:2(dict_1)
         1    0.061    0.061    0.752    0.752 profile:0(dict_1())
         0    0.000             0.000          profile:0(profiler)
    

    ======================================================================

          3 function calls in 551.600 CPU seconds
    
    Ordered by: standard name
    
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.026    0.026  551.518  551.518 :1(?)
         1  551.492  551.492  551.492  551.492 dictTest.py:7(dict_2)
         1    0.082    0.082  551.600  551.600 profile:0(dict_2())
         0    0.000             0.000          profile:0(profiler)
    

    As you can see, there's a massive difference.

    My system:

    Python 2.3b1

    Windows 2K

    PIII 1.13 GHz

Sign in to comment