Welcome, guest | Sign In | My Account | Store | Cart

I wrote this after reading The Alphanum Algorithm (http://www.davekoelle.com/alphanum.html) by David Koelle a few years ago; my goal was to improve the performances of the Python version of his scripts.

My version is approximatly 10 times faster than it's alphanum.py and about 3 times faster than the alphanum.py_v2.4 on my computer, yielding the same results (for non-unicode at least).

Note: see the version of wizkid in the comments which is even faster.

Python, 14 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def keynat(string):
        '''A natural sort helper function for sort() and sorted()
        without using regular expression.
        '''
        r = []
        for c in string:
                if c.isdigit():
                        if r and isinstance(r[-1], int):
                                r[-1] = r[-1] * 10 + int(c)
                        else:
                                r.append(int(c))
                else:
                        r.append(c)
        return r
Usage:
>>> items = ['Z', 'a', '10', '1', '9']

>>> sorted(items)
['1', '10', '9', 'Z', 'a']

>>> sorted(items, key=keynat)
['1', '9', '10', 'Z', 'a']

>>> items.sort(cmp=lambda a,b:cmp(keynat(a), keynat(b)))
>>> items
['1', '9', '10', 'Z', 'a']

4 comments

Vincent Tsai 12 years, 11 months ago  # | flag

Where is "keynat2"?

Romain Dartigues (author) 12 years, 11 months ago  # | flag

Damn, i should try to check what i'm posting sometimes :) thanks for reporting the typo.

wizkid 12 years, 7 months ago  # | flag

I made my own version before I found this post. So I benchmarked both.

On small keys mine was 30% faster:

sorted([binascii.hexlify(os.urandom(10)) for i in xrange(100000)], key=lambda s: keynat(s))

And on larger keys it was about 50% faster

sorted([binascii.hexlify(os.urandom(1000)) for i in xrange(1000)], key=lambda s: keynat(s))

My function looks like this. And I think one reason is that I don't call int() on every digit:

def natkey(s):
    if s == "":
        return [""]

    parts = []

    part = ""
    numberMode = False
    for c in s:
        if c.isdigit() ^ numberMode:
            if numberMode:
                part = int(part)
            parts.append(part)

            part = c

            numberMode = not numberMode
        else:
            part += c

    if numberMode:
        part = int(part)
    parts.append(part)

    return parts
Romain Dartigues (author) 12 years, 7 months ago  # | flag

Hi wizkid, yes, your version made mine totally obsolete; is 44 to 60% faster than mine on my setup. Thank's for sharing it!

Created by Romain Dartigues on Thu, 28 Apr 2011 (MIT)
Python recipes (4591)
Romain Dartigues's recipes (6)

Required Modules

  • (none specified)

Other Information and Tasks