Welcome, guest | Sign In | My Account | Store | Cart

naive natural sort (Python recipe) by Romain Dartigues
ActiveState Code (http://code.activestate.com/recipes/577679/)

I wrote this after reading The Alphanum Algorithm (http://www.davekoelle.com/alphanum.html) by David Koelle a few years ago; my goal was to improve the performances of the Python version of his scripts.

My version is approximatly 10 times faster than it's alphanum.py and about 3 times faster than the alphanum.py_v2.4 on my computer, yielding the same results (for non-unicode at least).

Note: see the version of wizkid in the comments which is even faster.

      def keynat(string):
        '''A natural sort helper function for sort() and sorted()
        without using regular expression.
        '''
        r = []
        for c in string:
                if c.isdigit():
                        if r and isinstance(r[-1], int):
                                r[-1] = r[-1] * 10 + int(c)
                        else:
                                r.append(int(c))
                else:
                        r.append(c)
        return r

      

Usage:

>>> items = ['Z', 'a', '10', '1', '9']

>>> sorted(items)
['1', '10', '9', 'Z', 'a']

>>> sorted(items, key=keynat)
['1', '9', '10', 'Z', 'a']

>>> items.sort(cmp=lambda a,b:cmp(keynat(a), keynat(b)))
>>> items
['1', '9', '10', 'Z', 'a']

Tags: ascii, sort

4 comments

Vincent Tsai 13 years ago # | flag

Where is "keynat2"?

Romain Dartigues (author) 13 years ago # | flag

Damn, i should try to check what i'm posting sometimes :) thanks for reporting the typo.

wizkid 12 years, 8 months ago # | flag

I made my own version before I found this post. So I benchmarked both.

On small keys mine was 30% faster:

sorted([binascii.hexlify(os.urandom(10)) for i in xrange(100000)], key=lambda s: keynat(s))

And on larger keys it was about 50% faster

sorted([binascii.hexlify(os.urandom(1000)) for i in xrange(1000)], key=lambda s: keynat(s))

My function looks like this. And I think one reason is that I don't call int() on every digit:

def natkey(s):
    if s == "":
        return [""]

    parts = []

    part = ""
    numberMode = False
    for c in s:
        if c.isdigit() ^ numberMode:
            if numberMode:
                part = int(part)
            parts.append(part)

            part = c

            numberMode = not numberMode
        else:
            part += c

    if numberMode:
        part = int(part)
    parts.append(part)

    return parts

Romain Dartigues (author) 12 years, 8 months ago # | flag

Hi wizkid, yes, your version made mine totally obsolete; is 44 to 60% faster than mine on my setup. Thank's for sharing it!

Created by Romain Dartigues on Thu, 28 Apr 2011 (MIT)

◄	Python recipes (4591)	►
◄	Romain Dartigues's recipes (6)	►

Required Modules

(none specified)

Other Information and Tasks

Licensed under the MIT License
Viewed 6258 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

naive natural sort (Python recipe) by Romain Dartigues ActiveState Code (http://code.activestate.com/recipes/577679/)