Welcome, guest | Sign In | My Account | Store | Cart

Tim Peter's recipe (52560) and bearophile's version (438599) seem a bit too complex. There are speed an sorting issues with each. Not to mention that neither keeps the data type of the input object. Here is my take on a python unique() function for enumerables (list, tuple, str).

Python, 37 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def unique(inlist, keepstr=True):
  typ = type(inlist)
  if not typ == list:
    inlist = list(inlist)
  i = 0
  while i < len(inlist):
    try:
      del inlist[inlist.index(inlist[i], i + 1)]
    except:
      i += 1
  if not typ in (str, unicode):
    inlist = typ(inlist)
  else:
    if keepstr:
      inlist = ''.join(inlist)
  return inlist

##
## testing...
##

assert unique( [[1], [2]] ) == [[1], [2]]
assert unique( ((1,),(2,)) ) == ((1,), (2,))
assert unique( ([1,],[2,]) ) == ([1,], [2,])
assert unique( ([1+2J],[2+1J],[1+2J]) ) == ([1+2j], [2+1j])
assert unique( ([1+2J],[1+2J]) ) == ([1+2j],)
assert unique( [0] * 1000 ) == [0]
assert unique( [1, 2, 3, 1, 2]) == [1, 2, 3]
assert unique( [3, 2, 3, 1, 2]) == [3, 2, 1]
s = "iterable dict based unique"
assert unique(s) == 'iterabl dcsunq'
assert unique(s, False) == ['i', 't', 'e', 'r', 'a', 'b', 'l', ' ', 'd', 'c', 's', 'u', 'n', 'q']
s = unicode(s)
assert unique(s, False) == [u'i', u't', u'e', u'r', u'a', u'b', u'l', u' ', u'd', u'c', u's', u'u', u'n', u'q']
assert unique(s) == u'iterabl dcsunq'

# all asserts should pass!

This version passes all the quasi-unit tests in bearophile's recipe (albeit returning the original object type [optionally for for str type] rather than returning a list unconditionally). No use of sets or dicts is required, and ordering is preserved. I haven't tested for speed, but subjectively it seems as fast, if not faster than the other recipes. Let me know if there are problems or odd cases I haven't accounted for. As far as my own testing goes, this works very well.

NB: Only tested under 2.4 and 2.5.

5 comments

Jordan Callicoat (author) 14 years, 9 months ago  # | flag

Nicer version from Paul Rubin. Paul Rubin posted a better version in the python mailing list:

def unique(seq, keepstr=True):
  t = type(seq)
  if t==str:
    t = (list, ''.join)[bool(keepstr)]
  seen = []
  return t(c for c in seq if not (c in seen or seen.append(c)))

Nice!

Jordan Callicoat (author) 14 years, 9 months ago  # | flag

Fixed to work with unicode string objects.

def unique(seq, keepstr=True):
  t = type(seq)
  if t in (str, unicode):
    t = (list, ''.join)[bool(keepstr)]
  seen = []
  return t(c for c in seq if not (c in seen or seen.append(c)))
Diego Novella 14 years, 9 months ago  # | flag

Using a set instead of a list ? What about replacing the last two lines with:

seen = set() return t(c for c in seq if not (c in seen or seen.add(c)))

? This shuld be faster.

Jordan Callicoat (author) 14 years, 9 months ago  # | flag

Using set() fails first assert. Problem is that "list objects are unhashable". Paul Rubin posted a case-optimized version (using sets where possible) to the python group:

http://tinyurl.com/24zchj

Diego Novella 14 years, 9 months ago  # | flag

Thank You for the link, I like that solution :-)

Created by Jordan Callicoat on Tue, 27 Feb 2007 (PSF)
Python recipes (4591)
Jordan Callicoat's recipes (4)

Required Modules

  • (none specified)

Other Information and Tasks