ActiveState Code

Recipe 502263: Yet Another Unique() Function


Tim Peter's recipe (52560) and bearophile's version (438599) seem a bit too complex. There are speed an sorting issues with each. Not to mention that neither keeps the data type of the input object. Here is my take on a python unique() function for enumerables (list, tuple, str).

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def unique(inlist, keepstr=True):
  typ = type(inlist)
  if not typ == list:
    inlist = list(inlist)
  i = 0
  while i < len(inlist):
    try:
      del inlist[inlist.index(inlist[i], i + 1)]
    except:
      i += 1
  if not typ in (str, unicode):
    inlist = typ(inlist)
  else:
    if keepstr:
      inlist = ''.join(inlist)
  return inlist

##
## testing...
##

assert unique( [[1], [2]] ) == [[1], [2]]
assert unique( ((1,),(2,)) ) == ((1,), (2,))
assert unique( ([1,],[2,]) ) == ([1,], [2,])
assert unique( ([1+2J],[2+1J],[1+2J]) ) == ([1+2j], [2+1j])
assert unique( ([1+2J],[1+2J]) ) == ([1+2j],)
assert unique( [0] * 1000 ) == [0]
assert unique( [1, 2, 3, 1, 2]) == [1, 2, 3]
assert unique( [3, 2, 3, 1, 2]) == [3, 2, 1]
s = "iterable dict based unique"
assert unique(s) == 'iterabl dcsunq'
assert unique(s, False) == ['i', 't', 'e', 'r', 'a', 'b', 'l', ' ', 'd', 'c', 's', 'u', 'n', 'q']
s = unicode(s)
assert unique(s, False) == [u'i', u't', u'e', u'r', u'a', u'b', u'l', u' ', u'd', u'c', u's', u'u', u'n', u'q']
assert unique(s) == u'iterabl dcsunq'

# all asserts should pass!

Discussion

This version passes all the quasi-unit tests in bearophile's recipe (albeit returning the original object type [optionally for for str type] rather than returning a list unconditionally). No use of sets or dicts is required, and ordering is preserved. I haven't tested for speed, but subjectively it seems as fast, if not faster than the other recipes. Let me know if there are problems or odd cases I haven't accounted for. As far as my own testing goes, this works very well.

NB: Only tested under 2.4 and 2.5.

Comments

  1. 1. At 7:09 p.m. on 27 feb 2007, Jordan Callicoat (the author) said:

    Nicer version from Paul Rubin. Paul Rubin posted a better version in the python mailing list:

    def unique(seq, keepstr=True):
      t = type(seq)
      if t==str:
        t = (list, ''.join)[bool(keepstr)]
      seen = []
      return t(c for c in seq if not (c in seen or seen.append(c)))
    

    Nice!

  2. 2. At 9:53 a.m. on 28 feb 2007, Jordan Callicoat (the author) said:

    Fixed to work with unicode string objects.

    def unique(seq, keepstr=True):
      t = type(seq)
      if t in (str, unicode):
        t = (list, ''.join)[bool(keepstr)]
      seen = []
      return t(c for c in seq if not (c in seen or seen.append(c)))
    
  3. 3. At 10:25 a.m. on 2 mar 2007, Diego Novella said:

    Using a set instead of a list ? What about replacing the last two lines with:

    seen = set() return t(c for c in seq if not (c in seen or seen.add(c)))

    ? This shuld be faster.

  4. 4. At 3:08 p.m. on 2 mar 2007, Jordan Callicoat (the author) said:

    Using set() fails first assert. Problem is that "list objects are unhashable". Paul Rubin posted a case-optimized version (using sets where possible) to the python group:

    http://tinyurl.com/24zchj

  5. 5. At 4:14 p.m. on 4 mar 2007, Diego Novella said:

    Thank You for the link, I like that solution :-)

Sign in to comment