Welcome, guest | Sign In | My Account | Store | Cart

Solution to the Hamming Number problem. Demonstrates a lazy evaluation evaluation technique using itertools.tee() to feed an iterator into itself. This is a common technique with Haskell. The deferred_output() function is the key technique for implementing a forward reference to the output of the stream.

Python, 26 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from itertools import tee, chain, islice, groupby
from heapq import merge

def hamming_numbers():
    # Generate "5-smooth" numbers, also called "Hamming numbers"
    # or "Regular numbers".  See: http://en.wikipedia.org/wiki/Regular_number
    # Finds solutions to 2**i * 3**j * 5**k  for some integers i, j, and k.

    def deferred_output():
        'Works like a forward reference to the "output" global variable'
        for i in output:
            yield i

    result, p2, p3, p5 = tee(deferred_output(), 4)  # split the output streams
    m2 = (2*x for x in p2)                          # multiples of 2
    m3 = (3*x for x in p3)                          # multiples of 3
    m5 = (5*x for x in p5)                          # multiples of 5
    merged = merge(m2, m3, m5)
    combined = chain([1], merged)                   # prepend starting point
    output = (k for k, v in groupby(combined))      # eliminate duplicates

    return result


if __name__ == '__main__':
    print(list(islice(hamming_numbers(), 1000)))    # Show first 1000 hamming numbers

Problem statement (see http://en.wikipedia.org/wiki/Regular_number ) : "Output in ascending sequence all numbers which have only 2, 3, and 5 as non-trivial factors. The first such number is 1; and if x is any such number, so are 2x, 3x, and 5x. Use three generators to produce the multiples and feed them to a merge routine, then eliminate duplicates which ensures they are eventually output in ascending order."

  1. The output is split four ways: one returned result and three copies to feed back in.
  2. Each of the three copies is consumed in a generator of multiples.
  3. The combined result has 1 as a starting point and is followed by the merge of the multiples.
  4. Repeated values (runs of the same value) are eliminated.

The important part of the technique is the deferred_output function which lets you assume the result is known. Using those assumed values, the actual results get assigned to the output variable which is referred to by deferred_output. The tee itertool copies and splits-off the returned result from the buffered streams which are fed back into the multiple generators. The chain itertool loads the initial value to the result so that the cyclical iterator will have a starting point.

This is a general purpose technique. It is used extensively in Haskell programming language examples.

The same technique can be used for a fibonacci sequence (though there are much simpler ways to do it):

def fibonacci():
    def deferred_output():
        for i in output:
            yield i

    result, c1, c2 = tee(deferred_output(), 3)
    paired = map(add, c1, islice(c2, 1, None))
    output = chain([0, 1], paired)
    return result

print(list(islice(fibonacci(), 50)))

5 comments

Mike Krell 12 years ago  # | flag

I needed to change "i is not last" to "i != last" to get no_repeats() to work properly.

Paddy McCarthy 12 years ago  # | flag

Could function deferred output be seen as a form of forward declaration of variable output, allowing it to be referenced before it is later assigned too?

  • Paddy.

P.S. this recipe inspired this

Paddy McCarthy 12 years ago  # | flag

As written above, I get repeats in the sequence:

>>> rh = list(islice(raymonds_hamming(), 1500))
>>> rh[:100]
[1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, 40, 45, 48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125, 128, 135, 144, 150, 160, 162, 180, 192, 200, 216, 225, 240, 243, 250, 256, 270, 270, 270, 288, 288, 300, 300, 300, 320, 320, 324, 324, 360, 360, 360, 375, 375, 384, 384, 400, 400, 405, 405, 432, 432, 450, 450, 450, 480, 480, 480, 486, 486, 500, 500, 512, 540, 540, 540, 540, 540, 576, 576, 576, 600, 600, 600, 600]
>>>

After making Mike's suggested change, the repeats disappear and the results tally with another generator using a different algorithm:

>>> rh = list(islice(raymonds_hamming(), 1500))
>>> rh[:100]
[1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, 40, 45, 48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125, 128, 135, 144, 150, 160, 162, 180, 192, 200, 216, 225, 240, 243, 250, 256, 270, 288, 300, 320, 324, 360, 375, 384, 400, 405, 432, 450, 480, 486, 500, 512, 540, 576, 600, 625, 640, 648, 675, 720, 729, 750, 768, 800, 810, 864, 900, 960, 972, 1000, 1024, 1080, 1125, 1152, 1200, 1215, 1250, 1280, 1296, 1350, 1440, 1458, 1500, 1536]
  • Paddy.
bearophile - 11 years, 11 months ago  # | flag

Very nice, Raymond.
This whole idea looks general&useful enough that may deserve some support in the language or in the itertools module, if possible.

(I'll need to work some on the D language standard library to allow similar things).

In that fibonacci() function this line:
paired = map(add, c1, islice(c2, 1, None))
Needs an imap() to work, instead of a map().

Gabriel Genellina 11 years, 10 months ago  # | flag

@bearophile: although not explicitely stated, this recipe seems to be using Python 3, where builtin map behaves like itertools.imap in 2.x But since that's the only place where the difference 2 vs 3 matters, a short comment in the code would be nice.