Welcome, guest | Sign In | My Account | Store | Cart

A simple little task, how to find the intersection of two hashes. There's probably many ways to do this but there is one way to avoid...

Python, 22 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# two dictionaries
# blow this up to 1000 to see the difference
some_dict = { 'zope':'zzz', 'python':'rocks' }
another_dict = { 'python':'rocks', 'perl':'$' }

# bad way
# two lots of "in"
intersect = []
for item in some_dict.keys():
  if item in another_dict.keys():
    intersect.append(item)

print "Intersects:", intersect

# good way
# use simple lookup with has_keys()
intersect = []
for item in some_dict.keys():
  if another_dict.has_key(item):
    intersect.append(item)

print "Intersects:", intersect

The keys method produces a list of all the keys to dict and it can be pretty simple to fall in to the trap of just using "in". However in the first example you looping through all of some_dict, then each time looping through all of another_dict.

By using the has_keys method, you are not looping another_dict, but pulling the key from the dict in the resizable hash table. The difference is understandably quite remarkable for large dictionaries.

10 comments

Tom Good 22 years, 11 months ago  # | flag

I think the code here has a typo, it should say "has_key" instead of "has_keys"

Alex Martelli 22 years, 10 months ago  # | flag

A Python 2.2 enhancement. Note that the difference disappears in the forthcoming Python 2.2: there, "if somekey in in adict" becomes equivalent to "if adict.has_key(somekey)". The recipe should therefore be indicated as only important for Python 2.1 and earlier.

andy mckay (author) 22 years, 10 months ago  # | flag

Oops. Thanks.

Chris Perkins 22 years, 10 months ago  # | flag

Alternatively...

intersect = [item for item in some_dict.keys() if another_dict.has_key(item)]

or

intersect = filter(another_dict.has_key, some_dict.keys())
Sami Hangaslammi 22 years, 10 months ago  # | flag

Dictionaries of different size. It is also worthwhile to note that if one dictionary has considerably more elements, the smallest dictionary should be the one to loop over. However, the difference starts to get noticeable only with dictionaries that have thousands of keys.

Robin Siebler 21 years ago  # | flag

Finding items not in both dictionaries. What if I want to find the items that are not in both dictionaries?

s g 16 years, 5 months ago  # | flag

Finding intersections and non-intersections. ### Finding intersections and non-intersections

some_dict = { 'zope':'zzz', 'python':'rocks','1':'one','2':'two','3':'three','4':'four' }

another_dict = { 'python':'rocks', 'perl':'$','2':'two','3':'three' }

intersection = dict([(item,some_dict[item]) for item in some_dict.keys() if another_dict.has_key(item)])

non_intersection = dict([(item,some_dict[item]) for item in some_dict.keys() if not another_dict.has_key(item)])

print "a new dict containing the intersection:",intersection

print "a new dict containing the non-intersection:",non_intersection

Dev Player 14 years, 11 months ago  # | flag

I do not know what a non-intersection is through education but I expected a non-intersection is:

keys in some_dict but not in another_dict AND

keys in another_dict but not in some_dict.

s g's recipe:

non_intersection = dict([(item,some_dict[item]) for item in some_dict.keys() if not another_dict.has_key(item)])

Testing this code in python 2.5 and Pythonwin with "print non_interection" shows that another_dict['perl'] was not appended to non_intersection even though it is not in some_dict.

Aaron Mitchell 14 years, 7 months ago  # | flag

You are correct. In order to complete the non-intersection, you must reverse the dictionaries in the same command:

non_intersection1 = dict([(item,some_dict[item]) for item in some_dict.keys() if not another_dict.has_key(item)])
non_intersection2 = dict([(item,another_dict[item]) for item in another_dict.keys() if not some_dict.has_key(item)])

and then merge them together:

non_intersection = non_intersection1.update( non_intersection2 )
sahib 11 years, 4 months ago  # | flag

I think in recent versions this might be easier. (Tested with Python 3.3.0)

>>> def intersect(d1, d2):
...     'Intersect dictionaries d1 and d2 by key *and* value'
...     return dict(d1.items() & d2.items())
>>>
>>> intersect({'a': 2}, {'b': 42, 'a': 2})
{'a': 2}
>>> intersect({'a': 2}, {'b': 42, 'a': 3})
{}

If you only want to intersect keys/values use keys() or values() instead (without the dict()). (Sorry if this is old, but google still lists it on first place)

Created by andy mckay on Thu, 31 May 2001 (PSF)
Python recipes (4591)
andy mckay's recipes (8)
Python Cookbook Edition 2 (117)
Python Cookbook Edition 1 (103)

Required Modules

  • (none specified)

Other Information and Tasks