Sorts strings in a way that seems natural to humans. If the strings contain integers, then the integers are ordered numerically. For example, sorts ['Team 11', 'Team 3', 'Team 1'] into the order ['Team 1', 'Team 3', 'Team 11'].
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # ---------------------------------------------------------
# natsort.py: Natural string sorting.
# ---------------------------------------------------------
# By Seo Sanghyeon. Some changes by Connelly Barnes.
def try_int(s):
"Convert to integer if possible."
try: return int(s)
except: return s
def natsort_key(s):
"Used internally to get a tuple by which s is sorted."
import re
return map(try_int, re.findall(r'(\d+|\D+)', s))
def natcmp(a, b):
"Natural string comparison, case sensitive."
return cmp(natsort_key(a), natsort_key(b))
def natcasecmp(a, b):
"Natural string comparison, ignores case."
return natcmp(a.lower(), b.lower())
def natsort(seq, cmp=natcmp):
"In-place natural string sort."
seq.sort(cmp)
def natsorted(seq, cmp=natcmp):
"Returns a copy of seq, sorted by natural string sort."
import copy
temp = copy.copy(seq)
natsort(temp, cmp)
return temp
|
You can use this code to sort tarball filenames:
>>> natsorted(['ver-1.3.12', 'ver-1.3.3', 'ver-1.2.5', 'ver-1.2.15', 'ver-1.2.3', 'ver-1.2.1'])
['ver-1.2.1', 'ver-1.2.3', 'ver-1.2.5', 'ver-1.2.15', 'ver-1.3.3', 'ver-1.3.12']
Chemical elements:
>>> natsorted(['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N', 'C3H6'])
['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N', 'C3H6']
Teams:
>>> natsorted(['Team 101', 'Team 58', 'Team 30', 'Team 1'])
['Team 1', 'Team 30', 'Team 58', 'Team 101']
Pass natcasecmp as a second argument for case-insensitive sorting:
>>> natsorted(['a5', 'A7', 'a15', 'a9', 'A8'], natcasecmp)
['a5', 'A7', 'A8', 'a9', 'a15']
Tags: text
How about a decorate, sort, undecorate technique? Here's an alternative approach that also makes use of re but may be faster by using the decorate, sort, undecorate idiom:
The re.sub method is used to prepend every numeric group with it's length, so 1 becomes 11, 25 becomes 325 and so on. This is done as a decorate step and the list can then be sorted with the standard sort() function (or sorted() in Python 2.4). Then the list is undecorated and left in sorted order.
short, modern alternative code. This uses newer features, like List Comprehensions [ http://www.network-theory.co.uk/docs/pytut/tut_38.html ] to make the code very small and fast.
The output is the original data, but sorted in natural order:
simplified version of list comprehensions code. For sorting lists:
Simple version not relying on the
re
moduleCompability (tested with: python 1.5.1, 2.2.1, 2.3.4, 2.4.3, 2.5.2).
Performances tested with
timeit
shows no signifiant differences (slightly faster however) with there
versions on small and medium lists. Similar solutions on: http://www.davekoelle.com/alphanum.htmlHere's a version I've been using which fully exploits list comprehensions. In addition to converting strings of digits to ints, it also allows case-insensitive sorting.
For each line, it splits it along the '/' character, which guarantees that a directory name always appears before the contents of the directory (i.e. in case a filename starts with an ordinal lower than 0x20). Then, for each path part, it breaks it into a list of digits and non-digits:
The primary benefit I find with this method is that it avoids use of try/except statements, and so lends itself well to oneliners for command line usage:
I haven't tested this method's performance, but I suspect it'd be a little bit slower than the other methods due to all the slicing and dicing.
(Tested in Python 2.5.2)
A similar function, but caseless comparison and avoiding exceptions. It handles digits much more quickly ( approx. 2-3 times ) in cpython 2.5, 2.6.
Here is my version of natural sort.
Here is the original code equipped to do reverse sorting: