ActiveState Code

Recipe 59875: Finding the intersection of two dicts


A simple little task, how to find the intersection of two hashes. There's probably many ways to do this but there is one way to avoid...

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# two dictionaries
# blow this up to 1000 to see the difference
some_dict = { 'zope':'zzz', 'python':'rocks' }
another_dict = { 'python':'rocks', 'perl':'$' }

# bad way
# two lots of "in"
intersect = []
for item in some_dict.keys():
  if item in another_dict.keys():
    intersect.append(item)

print "Intersects:", intersect

# good way
# use simple lookup with has_keys()
intersect = []
for item in some_dict.keys():
  if another_dict.has_key(item):
    intersect.append(item)

print "Intersects:", intersect

Discussion

The keys method produces a list of all the keys to dict and it can be pretty simple to fall in to the trap of just using "in". However in the first example you looping through all of some_dict, then each time looping through all of another_dict.

By using the has_keys method, you are not looping another_dict, but pulling the key from the dict in the resizable hash table. The difference is understandably quite remarkable for large dictionaries.

Comments

  1. 1. At 10:47 a.m. on 11 jun 2001, Tom Good said:

    I think the code here has a typo, it should say "has_key" instead of "has_keys"

  2. 2. At 2:50 a.m. on 18 jun 2001, Alex Martelli said:

    A Python 2.2 enhancement. Note that the difference disappears in the forthcoming Python 2.2: there, "if somekey in in adict" becomes equivalent to "if adict.has_key(somekey)". The recipe should therefore be indicated as only important for Python 2.1 and earlier.

  3. 3. At 9:28 a.m. on 19 jun 2001, andy mckay (the author) said:

    Oops. Thanks.

  4. 4. At 10:17 a.m. on 20 jun 2001, Chris Perkins said:

    Alternatively...

    intersect = [item for item in some_dict.keys() if another_dict.has_key(item)]
    
    or
    
    intersect = filter(another_dict.has_key, some_dict.keys())
    
  5. 5. At 6:34 a.m. on 9 jul 2001, Sami Hangaslammi said:

    Dictionaries of different size. It is also worthwhile to note that if one dictionary has considerably more elements, the smallest dictionary should be the one to loop over. However, the difference starts to get noticeable only with dictionaries that have thousands of keys.

  6. 6. At 8:28 p.m. on 1 may 2003, Robin Siebler said:

    Finding items not in both dictionaries. What if I want to find the items that are not in both dictionaries?

  7. 7. At 10:57 a.m. on 7 dec 2007, s g said:

    Finding intersections and non-intersections. ### Finding intersections and non-intersections

    some_dict = { 'zope':'zzz', 'python':'rocks','1':'one','2':'two','3':'three','4':'four' }

    another_dict = { 'python':'rocks', 'perl':'$','2':'two','3':'three' }

    intersection = dict([(item,some_dict[item]) for item in some_dict.keys() if another_dict.has_key(item)])

    non_intersection = dict([(item,some_dict[item]) for item in some_dict.keys() if not another_dict.has_key(item)])

    print "a new dict containing the intersection:",intersection

    print "a new dict containing the non-intersection:",non_intersection

  8. 8. At 9:46 p.m. on 7 jun 2009, dev player said:

    I do not know what a non-intersection is through education but I expected a non-intersection is:

    keys in some_dict but not in another_dict AND

    keys in another_dict but not in some_dict.

    s g's recipe:

    non_intersection = dict([(item,some_dict[item]) for item in some_dict.keys() if not another_dict.has_key(item)])
    

    Testing this code in python 2.5 and Pythonwin with "print non_interection" shows that another_dict['perl'] was not appended to non_intersection even though it is not in some_dict.

  9. 9. At 1:49 p.m. on 16 sep 2009, Aaron Mitchell said:

    You are correct. In order to complete the non-intersection, you must reverse the dictionaries in the same command:

    non_intersection1 = dict([(item,some_dict[item]) for item in some_dict.keys() if not another_dict.has_key(item)])
    non_intersection2 = dict([(item,another_dict[item]) for item in another_dict.keys() if not some_dict.has_key(item)])
    

    and then merge them together:

    non_intersection = non_intersection1.update( non_intersection2 )
    

Sign in to comment