Often, when working with a dictionary D, you need to use the entry D[k] if it's already present, or add a new D[k] if k wasn't a key into D. The setdefault method of dictionaries is a very handy shortcut for this task.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # say we're building a word->pagenumbers index -- a key piece of code might be:
theIndex = {}
def addword(word, pagenumber):
if theIndex.has_key(word):
theIndex[word].append(pagenumber)
else:
theIndex[word] = [pagenumber]
# incidentally, a good Pythonic instinct would be to substitute this
# "look before you leap" pattern with a "easier to get permission":
def addword(word, pagenumber):
try: theIndex[word].append(pagenumber)
except AttributeError: theIndex[word] = [pagenumber]
# but this is by the by -- just a minor simplification. However,
# this meets the pattern "use the entry if already present, else
# add a new entry". Here's how using setdefault simplifies this:
def addword(word, pagenumber):
theIndex.setdefault(word,[]).append(pagenumber)
|
Basically, dict.setdefault(k,v) is much like dict.get(k,v), except that, if not dict.haskey(k), the setdefault methods assign dict]=v as well as returning v (while get would just return v, without affecting dict in any way). Therefore, setdefault is appropriate any time you have get-like needs but also need to have this side-effect on the dictionary.
setdefault is particularly useful for the very common data structure that is a dictionary whose values are lists, and the single most typical usage form for it is somedict.setdefault(somekey,[]).append(somevalue).
Note that setdefault is normally not useful if the values are immutable. If you just want to count words, for example, theIndex.setdefault(word,1) is not very useful -- rather, use theIndex[word] = 1 + theIndex.get(word,0).
Better? I'm curious to hear how this technique compares to another common idiom I've seen, which for the example would be:
TIA
I had the same question.... So I ran the following code. I have yet to get a result for dict_2(). It was running for far too long. dict_1() takes about .7 seconds.
Some specifics. Here are some specifics on the test I ran above:
======================================================================
As you can see, there's a massive difference.
My system:
Python 2.3b1
Windows 2K
PIII 1.13 GHz
Does the [] get evaluated every time the function is called, thus creating a new list that's passed to the func- tion? Or is there some optimization to keep a new list from being created (or from it costing too much at the implementation level, like copy-on-write) if unneeded? Cause if you use a constructor call, like in the below, there's definitely no magic shortcut happening
Latest versions of Python have
collections.defaultdict
for this very purpose