Tim Peter's recipe (52560) and bearophile's version (438599) seem a bit too complex. There are speed an sorting issues with each. Not to mention that neither keeps the data type of the input object. Here is my take on a python unique() function for enumerables (list, tuple, str).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | def unique(inlist, keepstr=True):
typ = type(inlist)
if not typ == list:
inlist = list(inlist)
i = 0
while i < len(inlist):
try:
del inlist[inlist.index(inlist[i], i + 1)]
except:
i += 1
if not typ in (str, unicode):
inlist = typ(inlist)
else:
if keepstr:
inlist = ''.join(inlist)
return inlist
##
## testing...
##
assert unique( [[1], [2]] ) == [[1], [2]]
assert unique( ((1,),(2,)) ) == ((1,), (2,))
assert unique( ([1,],[2,]) ) == ([1,], [2,])
assert unique( ([1+2J],[2+1J],[1+2J]) ) == ([1+2j], [2+1j])
assert unique( ([1+2J],[1+2J]) ) == ([1+2j],)
assert unique( [0] * 1000 ) == [0]
assert unique( [1, 2, 3, 1, 2]) == [1, 2, 3]
assert unique( [3, 2, 3, 1, 2]) == [3, 2, 1]
s = "iterable dict based unique"
assert unique(s) == 'iterabl dcsunq'
assert unique(s, False) == ['i', 't', 'e', 'r', 'a', 'b', 'l', ' ', 'd', 'c', 's', 'u', 'n', 'q']
s = unicode(s)
assert unique(s, False) == [u'i', u't', u'e', u'r', u'a', u'b', u'l', u' ', u'd', u'c', u's', u'u', u'n', u'q']
assert unique(s) == u'iterabl dcsunq'
# all asserts should pass!
|
This version passes all the quasi-unit tests in bearophile's recipe (albeit returning the original object type [optionally for for str type] rather than returning a list unconditionally). No use of sets or dicts is required, and ordering is preserved. I haven't tested for speed, but subjectively it seems as fast, if not faster than the other recipes. Let me know if there are problems or odd cases I haven't accounted for. As far as my own testing goes, this works very well.
NB: Only tested under 2.4 and 2.5.
Nicer version from Paul Rubin. Paul Rubin posted a better version in the python mailing list:
Nice!
Fixed to work with unicode string objects.
Using a set instead of a list ? What about replacing the last two lines with:
seen = set() return t(c for c in seq if not (c in seen or seen.add(c)))
? This shuld be faster.
Using set() fails first assert. Problem is that "list objects are unhashable". Paul Rubin posted a case-optimized version (using sets where possible) to the python group:
http://tinyurl.com/24zchj
Thank You for the link, I like that solution :-)