ActiveState Code

Recipe 170242: case-insensitive sort of list of strings


The default compare function used when sorting a list of strings uses the ordinal value of the string characters for comparison. This results in the typical ASCII sort result of 'B' < 'a'. If a user is going to see the list, it is generally better to do a case-insensitive comparison.

Python
1
2
3
4
5
6
7
8
9
def caseinsensitive_sort(stringList):
    """case-insensitive string comparison sort
    doesn't do locale-specific compare
    though that would be a nice addition
    usage: stringList = caseinsensitive_sort(stringList)"""

    tupleList = [(x.lower(), x) for x in stringList]
    tupleList.sort()
    return [x[1] for x in tupleList]

Discussion

While you can provide your own compare method to pass to the sort method such as:

def compare(a, b): return cmp(a.lower(), b.lower())

stringList.sort(compare)

or use a lambda

stringList.sort(lambda a, b: cmp(a.lower(), b.lower()))

this results in the lower() conversion for every comparison. Thanks to Martin v. Loewis for making this all clear.

Comments

  1. 1. At 10:56 a.m. on 20 dec 2002, Robin Thomas said:

    you can amend the recipe to sort in-place as well. list.sort, significantly, sorts the list object in place (as links to this recipe have mentioned). The method returns None, not the list object, to remind you of this side effect.

    If you wish to constrain your sort implementation to behave as list.sort -- either to subclass list or merely to have your sort routines be consistent with list.sort -- use the function in the recipe, but at the end of the function, do

    list[:] = [x[1] for x in tuplesList]
    

    and that will effect the sort "in place".

    IIRC, list.sort in CPython, implemented natively in C, takes some precautions to avoid problems with concurrent use of the list by another thread while it's being sorted by the thread doing list.sort(). I don't know whether those precautions are important enough to use that they preclude using your sort function with a full-on replacement of the list contents at the end.

    You could try using a compare function that uses a dict object to "memoize" the results of upper on each element.

    def sort_case_insensitive(listobj):
        memodict = {}
        def compare(a,b):
            aval = memodict.get(a)
            if aval is None: aval = memodict.setdefault(a, a.upper())
            bval = memodict.get(b)
            if bval is None: bval = memodict.setdefault(b, b.upper())
            return cmp(aval, bval)
        listobj.sort(compare)
    

    While correct, the above is about twice as slow as list.sort(lambda a,b: cmp(a.upper(), b.upper()) for a list of 52 unique strings sharing 26 unique upper-case representations.

  2. 2. At 6:05 p.m. on 2 jan 2003, Guido van Rossum said:

    Schwartzian Transform. This is an example of a Schwartzian Transform (a Perl concept that works in Python too). It's in the Python FAQ:

    http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.051.htp

Sign in to comment