Yet Another Unique() Function « Python recipes

Tim Peter's recipe (52560) and bearophile's version (438599) seem a bit too complex. There are speed an sorting issues with each. Not to mention that neither keeps the data type of the input object. Here is my take on a python unique() function for enumerables (list, tuple, str).

      def unique(inlist, keepstr=True):
  typ = type(inlist)
  if not typ == list:
    inlist = list(inlist)
  i = 0
  while i < len(inlist):
    try:
      del inlist[inlist.index(inlist[i], i + 1)]
    except:
      i += 1
  if not typ in (str, unicode):
    inlist = typ(inlist)
  else:
    if keepstr:
      inlist = ''.join(inlist)
  return inlist

##
## testing...
##

assert unique( [[1], [2]] ) == [[1], [2]]
assert unique( ((1,),(2,)) ) == ((1,), (2,))
assert unique( ([1,],[2,]) ) == ([1,], [2,])
assert unique( ([1+2J],[2+1J],[1+2J]) ) == ([1+2j], [2+1j])
assert unique( ([1+2J],[1+2J]) ) == ([1+2j],)
assert unique( [0] * 1000 ) == [0]
assert unique( [1, 2, 3, 1, 2]) == [1, 2, 3]
assert unique( [3, 2, 3, 1, 2]) == [3, 2, 1]
s = "iterable dict based unique"
assert unique(s) == 'iterabl dcsunq'
assert unique(s, False) == ['i', 't', 'e', 'r', 'a', 'b', 'l', ' ', 'd', 'c', 's', 'u', 'n', 'q']
s = unicode(s)
assert unique(s, False) == [u'i', u't', u'e', u'r', u'a', u'b', u'l', u' ', u'd', u'c', u's', u'u', u'n', u'q']
assert unique(s) == u'iterabl dcsunq'

# all asserts should pass!

      

This version passes all the quasi-unit tests in bearophile's recipe (albeit returning the original object type [optionally for for str type] rather than returning a list unconditionally). No use of sets or dicts is required, and ordering is preserved. I haven't tested for speed, but subjectively it seems as fast, if not faster than the other recipes. Let me know if there are problems or odd cases I haven't accounted for. As far as my own testing goes, this works very well.

NB: Only tested under 2.4 and 2.5.

Tags: algorithms

5 comments

Jordan Callicoat (author) 17 years, 1 month ago # | flag

Nicer version from Paul Rubin. Paul Rubin posted a better version in the python mailing list:

def unique(seq, keepstr=True):
  t = type(seq)
  if t==str:
    t = (list, ''.join)[bool(keepstr)]
  seen = []
  return t(c for c in seq if not (c in seen or seen.append(c)))

Nice!

Jordan Callicoat (author) 17 years, 1 month ago # | flag

Fixed to work with unicode string objects.

def unique(seq, keepstr=True):
  t = type(seq)
  if t in (str, unicode):
    t = (list, ''.join)[bool(keepstr)]
  seen = []
  return t(c for c in seq if not (c in seen or seen.append(c)))

Diego Novella 17 years, 1 month ago # | flag

Using a set instead of a list ? What about replacing the last two lines with:

seen = set() return t(c for c in seq if not (c in seen or seen.add(c)))

? This shuld be faster.

Jordan Callicoat (author) 17 years, 1 month ago # | flag

Using set() fails first assert. Problem is that "list objects are unhashable". Paul Rubin posted a case-optimized version (using sets where possible) to the python group:

http://tinyurl.com/24zchj

Diego Novella 17 years, 1 month ago # | flag

Thank You for the link, I like that solution :-)

◄	Python recipes (4591)	►
◄	Jordan Callicoat's recipes (4)	►

Yet Another Unique() Function (Python recipe) by Jordan Callicoat
ActiveState Code (http://code.activestate.com/recipes/502263/)

5 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Yet Another Unique() Function (Python recipe) by Jordan Callicoat ActiveState Code (http://code.activestate.com/recipes/502263/)

5 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Yet Another Unique() Function (Python recipe) by Jordan Callicoat
ActiveState Code (http://code.activestate.com/recipes/502263/)