Welcome, guest | Sign In | My Account | Store | Cart

Python has a powerful suite of tools for comparing lists by way of sets and frozensets. Here are a few examples and conveniences that many newcomers, even a few seasoned developers, are unaware.

Python, 146 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
#!/usr/bin/env python
""" Convenience methods for list comparison & manipulation

Fast and useful, set/frozenset* only retain unique values,
duplicates are automatically removed.

    lr_union    union
        merge values, remove duplicates

    lr_diff     difference
        left elements, subtracting any in common with right

    lr_intr     intersection
        only common values found in both left and right

    lr_symm     symmetric_difference
        omit values found in both left and right

    lr_cont     issuperset
        test left contains all values from right

    * Unlike set, frozenset preserves its own order and is
        immutable. They do not preserve the source-order.

"""

lr_union = lambda l, r: list(set(l).union(r))
lr_diff = lambda l, r: list(set(l).difference(r))
lr_intr = lambda l, r: list(set(l).intersection(r))
lr_symm = lambda l, r: list(set(l).symmetric_difference(r))
lr_cont = lambda l, r: set(l).issuperset(r)

# silent example of lr_intr if None is passed instead of list
lrq_intr = lambda l, r: list(set(l).intersection(r or []))

# ------------ NOTHING BELOW HERE IS REQUIRED --------------

def tests():
    """ doctest tests/examples for set and set conveniences

    A few examples without the conveniences above.

    Strings are a form of list, they can be passed where apropriate
    >>> set('aabbcc') # only unique are returned
    set(['a', 'c', 'b'])

    Do the work and cast as list (switch to tuple if prefered)
    >>> list(set('aabbcc'))
    ['a', 'c', 'b']

    Using list does not remove duplicates
    >>> list('aabbcc') # list is not unique
    ['a', 'a', 'b', 'b', 'c', 'c']

    Simple join of lists, note the redundant values
    >>> ['a', 'a', 'b'] + ['b', 'c', 'c']
    ['a', 'a', 'b', 'b', 'c', 'c']

    Join both lists, return only unique values, join list before set (slower)
    >>> list(set(['a', 'a', 'b'] + ['b', 'c', 'c']))
    ['a', 'c', 'b']

    Join lists, as above, using built-in set library (faster)
    >>> lr_union(['a', 'a', 'b'], ['b', 'c', 'c'])
    ['a', 'c', 'b']

    Remove right values from left
    >>> lr_diff(['a', 'b'], ['b', 'c'])
    ['a']

    Remove as above, swapped/reordered inputs to remove left from right
    >>> lr_diff(['b', 'c'], ['a', 'b'])
    ['c']

    Common elements
    >>> lr_intr(['a', 'b'], ['b', 'c'])
    ['b']

    Unique elements (remove the common, intersecting, values)
    Note: similar to left-right + right-left.
    >>> lr_symm(['a', 'b'], ['b', 'c'])
    ['a', 'c']

    Is left a superset of (does it contain) the right
    >>> lr_cont(['a', 'b'], ['b', 'c'])
    False
    >>> lr_cont(['a', 'b', 'c'], ['b', 'c'])
    True

    Marginally less trite examples using words
    >>> lwords = 'the quick brown fox'.split()
    >>> rtags = 'brown,fox,jumps,over'.split(',')

    Return all unique words from both lists.
    >>> lr_union(lwords,rtags)
    ['brown', 'over', 'fox', 'quick', 'the', 'jumps']

    Return unique common, intersecting, words. Members of left AND right only.
    >>> lr_intr(lwords,rtags)
    ['brown', 'fox']

    Return unique uncommon words. Members of left OR right
    >>> lr_symm(lwords,rtags)
    ['quick', 'the', 'jumps', 'over']

    Note: intersection + symmetric = union, but don't count on their order!

    """

def insecure_demo():
    """Compact method to demo functionality"""

    left, right = list('aab'), list('bcc')
    both = left + right
    both.sort()

    lamb_dict = {'Difference (Remainder of subtract: left - right)': 'lr_diff',
                'Intersection (Only in left AND right)': 'lr_intr',
                'Symmetric Difference (Only in left OR right)': 'lr_symm',
                'Union (Unique list of ALL values)': 'lr_union'}

    print "Demo methods for comparing lists using set/frozenset\n"
    print "'left' list: %s" % repr(left)
    print "'right' list: %s" % repr(right)
    print "'both' lists: %s\n" % repr(both)
    print '-' * 30

    for lamb_desc, lamb_name in lamb_dict.items():
        lamb_func = globals().get(lamb_name)
        resp = lamb_func(left, right)
        resp.sort()

        print lamb_desc #'Obtain the %s of two lists' % lamb_name.split('_')[-1]
        print '>>> %s(%r, %r)' % (lamb_name, left, right)
        print resp
        print '-' * 30

if __name__ == '__main__':

    import doctest
    doctest.testmod()

    insecure_demo()

    URL = 'http://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset'
    print "\nBe sure to visit:\n", URL

A few 'convenience' lambdas for list-'fu'

These are tasks I have seen handled, repeatedly, as roll-your-own. Life is too short for doing that when good solutions are so close at hand.

Ultimately, if you have been, or are, contemplating writing list-parsing routines to compare, find or eliminate common elements, reduce redundant value-elements, etc., then you'll like sets and frozensets.

All the gory details are in doc-strings and comments, so I won't repeat them here.

The doctests are silent when there are no errors, so a crappy demo-method performs and prints results from these functions.

Since this is likely of more use to newcomers, I have included some similar list functionality to highlight differences and usefulness.

The inclusion of anonymous functions as "wrapper-code" using lamba is not essential or even recommended for a majority of use-cases. Still, they can be useful especially if renamed to clarify their purpose in whatever context they are deployed.

So copy/paste useful parts into your script(s), or re-write as needed. These are so useful, you'll quickly remember the syntax after a couple times. Way too trivial to merit the dependency baggage of a library.