Two methods to return the intersection/union of sets of data, where the form of the data is not a limiting factor.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | def intersection(set1, set2, *args):
"""
Returns intersection of tuples/lists of data, where
1) the number of sets is greater than 1,
2) the dimensionality of the sets is not pre-defined,
s1 = [(1,), 'e', [[1, 7], (4L, 3j, ('', None))], ([2], {'a': 'b'})]
3) the sets are not required to share common dimensionality,
s2 = (1, 2, 3, None)
4) the returned object is a one-dimensional list of objects which are
neither TupleType nor ListType and has no duplicates.
intersection(s1, s2) == [1, 2, None]
"""
result = []
sets = []
sets_append = sets.append
result_append = result.append
sets_append(union(set1))
sets_append(union(set2))
for arg in args:
sets_append(union(arg))
for obj in sets[0]:
for i in range(1, len(sets), 1):
hit = obj in sets[i]
if not hit:
break
if hit:
result_append(obj)
return compact(result)
def union(*args):
"""
Returns union of tuples/lists of data, where
1) the dimensionality of the sets is not pre-defined,
s1 = [(1,), 'e', [[1, 7], (4L, 3j, ('', None))], ([2], {'a': 'b'})]
2) the sets are not required to share common dimensionality,
s2 = (1, 2, 3, None)
3) the union of one set is the set stripped of duplicates,
4) the returned object is a one-dimensional list of objects which are
neither TupleType nor ListType and has no duplicates.
union(s1) == [{'a': 'b'}, 'e', 3j, None, 4L, 7, 2, 1, '']
union(s1, s2) == [{'a': 'b'}, 'e', 3j, None, 4L, 7, 3, 2, 1, '']
"""
result = []
sequenceSet = (type([1]), type((1,)))
result_extend = result.extend
result_append = result.append
for arg in args:
if type(arg) in sequenceSet:
for obj in arg:
result_extend(union(obj))
else:
result_append(arg)
return compact(result)
def compact(sequence):
"""
Returns list of objects in sequence sans duplicates, where
s1 = (1, 1, (1,), (1,), [1], [1], None, '', 0)
compact(s1) == [[1], 1, 0, (1,), None, '']
"""
result = []
dict_ = {}
result_append = result.append
for i in sequence:
try:
dict_[i] = 1
except:
if i not in result:
result_append(i)
result.extend(dict_.keys())
return result
|
I am fairly new to programming, and these are the first methods I have put together that might be of some value herein. I have tried to make intersection and union as flexible as possible. I do understand that compact is not the fastest method for removing duplicates, but I like it because it is small, flexible, reliable, and generally reasonably fast. Any corrections and/or advise on how to improve intersection or union will be greatly appreciated. I look forward to both learning from and submitting to this site often. FMHj