Welcome, guest | Sign In | My Account | Store | Cart

The default compare function used when sorting a list of strings uses the ordinal value of the string characters for comparison. This results in the typical ASCII sort result of 'B' < 'a'. If a user is going to see the list, it is generally better to do a case-insensitive comparison.

Python, 9 lines
1
2
3
4
5
6
7
8
9
def caseinsensitive_sort(stringList):
    """case-insensitive string comparison sort
    doesn't do locale-specific compare
    though that would be a nice addition
    usage: stringList = caseinsensitive_sort(stringList)"""

    tupleList = [(x.lower(), x) for x in stringList]
    tupleList.sort()
    return [x[1] for x in tupleList]

While you can provide your own compare method to pass to the sort method such as:

def compare(a, b): return cmp(a.lower(), b.lower())

stringList.sort(compare)

or use a lambda

stringList.sort(lambda a, b: cmp(a.lower(), b.lower()))

this results in the lower() conversion for every comparison. Thanks to Martin v. Loewis for making this all clear.

3 comments

Robin Thomas 18 years, 11 months ago  # | flag

you can amend the recipe to sort in-place as well. list.sort, significantly, sorts the list object in place (as links to this recipe have mentioned). The method returns None, not the list object, to remind you of this side effect.

If you wish to constrain your sort implementation to behave as list.sort -- either to subclass list or merely to have your sort routines be consistent with list.sort -- use the function in the recipe, but at the end of the function, do

list[:] = [x[1] for x in tuplesList]

and that will effect the sort "in place".

IIRC, list.sort in CPython, implemented natively in C, takes some precautions to avoid problems with concurrent use of the list by another thread while it's being sorted by the thread doing list.sort(). I don't know whether those precautions are important enough to use that they preclude using your sort function with a full-on replacement of the list contents at the end.

You could try using a compare function that uses a dict object to "memoize" the results of upper on each element.

def sort_case_insensitive(listobj):
    memodict = {}
    def compare(a,b):
        aval = memodict.get(a)
        if aval is None: aval = memodict.setdefault(a, a.upper())
        bval = memodict.get(b)
        if bval is None: bval = memodict.setdefault(b, b.upper())
        return cmp(aval, bval)
    listobj.sort(compare)

While correct, the above is about twice as slow as list.sort(lambda a,b: cmp(a.upper(), b.upper()) for a list of 52 unique strings sharing 26 unique upper-case representations.

Guido van Rossum 18 years, 11 months ago  # | flag

Schwartzian Transform. This is an example of a Schwartzian Transform (a Perl concept that works in Python too). It's in the Python FAQ:

http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.051.htp

Martin Miller 7 years, 3 months ago  # | flag

The link in Guido's ancient comment is broken. I think its equivalent would likely now be the to the section titled The Old Way Using Decorate-Sort-Undecorate in the Python Sorting HOWTO -- which refers to it by another common name for the technique.

Created by Kevin Altis on Thu, 19 Dec 2002 (PSF)
Python recipes (4591)
Kevin Altis's recipes (4)

Required Modules

  • (none specified)

Other Information and Tasks