Welcome, guest | Sign In | My Account | Store | Cart

Sometimes it is needed to do set-like operations on database query results, or simple lists, without the burden of implementing a class for sets, or importing a separate module. List comprehensions are quick way to do this, here is a collection of them. Although most of them are banal, let us just collect them in one place. Also, they are not the fastest, nor the most elegant, but the quickest to drop into your code if you need a proof of concept.

Python, 176 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# agents
# agent_id, agent_famname,agent_name
t1=[[100,'Brown','Jack'],
    [101,'Red','James'],
    [102,'Black','John'],
    [103,'White','Jeff'],
    [104,'White','Jasper']]

# clients
# client_id,agent_id,client_name
t2=[[100,100,'McDonalds'],
    [101,100,'KFC'],
    [102,102,'Burger King'],
    [103,103,'Chinese'],
    [104,999,'French']]

# more agents /agents1
# agent_id, agent_famname,agent_name
t3=[[200,'Smith','Jack'],
    [101,'Red','James'],
    [201,'Carpenter','John'],
    [103,'White','Jeff']]



# restriction
# SQL: select * from agents where agent_famname='White'

res=[row for row in t1 if row[1]=='White']

assert res == [[103, 'White', 'Jeff'],
               [104, 'White', 'Jasper']], \
               'restriction failed'


# projection
# SQL: select agent_name,agent_famname from agents

res=[[row[2],row[1]] for row in t1 ]

assert res == [['Jack', 'Brown'],
               ['James', 'Red'],
               ['John', 'Black'],
               ['Jeff', 'White'],
               ['Jasper', 'White']],\
               'projection failed'
               

# cross-product (cartesian product)
# SQL: select * from agents, clients

res= [r1+r2 for r1 in t1 for r2 in t2 ]

assert res == [[100, 'Brown', 'Jack', 100, 100, 'McDonalds'],
               [100, 'Brown', 'Jack', 101, 100, 'KFC'],
               [100, 'Brown', 'Jack', 102, 102, 'Burger King'],
               [100, 'Brown', 'Jack', 103, 103, 'Chinese'],
               [100, 'Brown', 'Jack', 104, 999, 'French'],
               [101, 'Red', 'James', 100, 100, 'McDonalds'],
               [101, 'Red', 'James', 101, 100, 'KFC'],
               [101, 'Red', 'James', 102, 102, 'Burger King'],
               [101, 'Red', 'James', 103, 103, 'Chinese'],
               [101, 'Red', 'James', 104, 999, 'French'],
               [102, 'Black', 'John', 100, 100, 'McDonalds'],
               [102, 'Black', 'John', 101, 100, 'KFC'],
               [102, 'Black', 'John', 102, 102, 'Burger King'],
               [102, 'Black', 'John', 103, 103, 'Chinese'],
               [102, 'Black', 'John', 104, 999, 'French'],
               [103, 'White', 'Jeff', 100, 100, 'McDonalds'],
               [103, 'White', 'Jeff', 101, 100, 'KFC'],
               [103, 'White', 'Jeff', 102, 102, 'Burger King'],
               [103, 'White', 'Jeff', 103, 103, 'Chinese'],
               [103, 'White', 'Jeff', 104, 999, 'French'],
               [104, 'White', 'Jasper', 100, 100, 'McDonalds'],
               [104, 'White', 'Jasper', 101, 100, 'KFC'],
               [104, 'White', 'Jasper', 102, 102, 'Burger King'],
               [104, 'White', 'Jasper', 103, 103, 'Chinese'],
               [104, 'White', 'Jasper', 104, 999, 'French']],\
               'cross product failed'
               
# equi join / inner join
# SQL: select agents.*, clients.* from agents,clients
#      where agents.agent_id=clients.agent_id

res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]

assert res == [[100, 'Brown', 'Jack', 100, 100, 'McDonalds'],
               [100, 'Brown', 'Jack', 101, 100, 'KFC'],
               [102, 'Black', 'John', 102, 102, 'Burger King'],
               [103, 'White', 'Jeff', 103, 103, 'Chinese']],\
               'inner join failed'

               
# left outer join
# SQL: select agents.*, clients.* from agents left outer join clients
#      where agents.agent_id = clients.agent_id

res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]+\
     [r1+[None]*len(t2[0]) for r1 in t1 if r1[0] not in [r2[1] for r2 in t2]]

assert res == [[100, 'Brown', 'Jack', 100, 100, 'McDonalds'],
               [100, 'Brown', 'Jack', 101, 100, 'KFC'],
               [102, 'Black', 'John', 102, 102, 'Burger King'],
               [103, 'White', 'Jeff', 103, 103, 'Chinese'],
               [101, 'Red', 'James', None, None, None],
               [104, 'White', 'Jasper', None, None, None]],\
               'left outer join failed'

               
# right outer join
# SQL: select agents.*, clients.* from agents right outer join clients
#      where agents.agent_id = clients.agent_id

res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]+\
     [[None]*len(t1[0])+r2 for r2 in t2 if r2[1] not in [r1[0] for r1 in t1]]

assert res == [[100, 'Brown', 'Jack', 100, 100, 'McDonalds'],
               [100, 'Brown', 'Jack', 101, 100, 'KFC'],
               [102, 'Black', 'John', 102, 102, 'Burger King'],
               [103, 'White', 'Jeff', 103, 103, 'Chinese'],
               [None, None, None, 104, 999, 'French']],\
               'right outer join failed'


# full outer join
# SQL: select agents.*, clients.* from agents full outer join clients
#      where agents.agent_id = clients.agent_id

res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]+\
     [r1+[None]*len(t2[0]) for r1 in t1 if r1[0] not in [r2[1] for r2 in t2]]+\
     [[None]*len(t1[0])+r2 for r2 in t2 if r2[1] not in [r1[0] for r1 in t1]]

assert res == [[100, 'Brown', 'Jack', 100, 100, 'McDonalds'],
               [100, 'Brown', 'Jack', 101, 100, 'KFC'],
               [102, 'Black', 'John', 102, 102, 'Burger King'],
               [103, 'White', 'Jeff', 103, 103, 'Chinese'],
               [101, 'Red', 'James', None, None, None],
               [104, 'White', 'Jasper', None, None, None],
               [None, None, None, 104, 999, 'French']],\
               'full join failed'
               

# union
# SQL: select * from agents union select * from agents1

res=t1+[r2 for r2 in t3 if r2 not in t1]

assert res == [[100, 'Brown', 'Jack'],
               [101, 'Red', 'James'],
               [102, 'Black', 'John'],
               [103, 'White', 'Jeff'],
               [104, 'White', 'Jasper'],
               [200, 'Smith', 'Jack'],
               [201, 'Carpenter', 'John']], \
               'union failed'


# intersection
# SQL: select * from agents intersect select * from agents1

res=[r2 for r2 in t3 if r2 in t1]

assert res == [[101, 'Red', 'James'],
               [103, 'White', 'Jeff']], \
               'intersection failed'


# difference
# SQL: select * from agents minus select * from agents1

res=[r1 for r1 in t1 if r1 not in t3]

assert res == [[100, 'Brown', 'Jack'],
               [102, 'Black', 'John'],
               [104, 'White', 'Jasper']], \
               'difference failed'

Actually the whole idea started from the cross product for which initially I had a for cycle. Then I took the set operations one by one, most of them turned out to be banal with list comprehensions, but since I took the effort to try them out I included them in the recipe. I am not satisfied with the outer joins though. I hope someone can find a more elegant solution for them.

2 comments

Raymond Hettinger 19 years ago  # | flag

Indexed Tables. You already have an elegant solution that shows-off list comprehensions to good advantage.

The similarity to SQL and speed of execution can both improve with a row match helper function that uses implicit indexing:

def rowmatches(table, col, value, cache={}):
    "Select * from table where col=value"
    cachekey = (id(table), col)
    try:
        return cache[cachekey].get(value, [])
    except KeyError:
        index = {}
        for row in table:
            index.setdefault(row[col], []).append(row)
        cache[cachekey] = index
        return index.get(value, [])

Here are the old and new versions of restriction:

# SQL: select * from agents where agent_famname='White'
res=[row for row in t1 if row[1]=='White']
res=[row for row in rowmatches(t1,1,'White')]

Here are the old and new versions of an inner join:

# SQL: select agents.*, clients.* from agents,clients
#      where agents.agent_id=clients.agent_id
res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]
res= [r1+r2 for r1 in t1 for r2 in rowmatches(t2,1,r1[0])]

Here are the old and new versions of a left join:

# SQL: select agents.*, clients.* from agents left outer join clients
#      where agents.agent_id = clients.agent_id

res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]+\
     [r1+[None]*len(t2[0]) for r1 in t1 if r1[0] not in [r2[1] for r2 in t2]]
res= [r1+r2 for r1 in t1 for r2 in rowmatches(t2,1,r1[0])]+\
     [r1+[None]*len(t2[0]) for r1 in t1 if not rowmatches(t2,1,r1[0])]

Here are old and new versions of a right outer join:

# SQL: select agents.*, clients.* from agents right outer join clients
#      where agents.agent_id = clients.agent_id
res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]+\
     [[None]*len(t1[0])+r2 for r2 in t2 if r2[1] not in [r1[0] for r1 in t1]]
res= [r1+r2 for r1 in t1 for r2 in rowmatches(t2,1,r1[0])]+\
     [[None]*len(t1[0])+r2 for r2 in t2 if not rowmatches(t1,0,r2[1])]

Here are the old and new versions of a full outer join

# SQL: select agents.*, clients.* from agents full outer join clients
#      where agents.agent_id = clients.agent_id
res= [r1+r2 for r1 in t1 for r2 in t2 if r1[0]==r2[1]]+\
     [r1+[None]*len(t2[0]) for r1 in t1 if r1[0] not in [r2[1] for r2 in t2]]+\
     [[None]*len(t1[0])+r2 for r2 in t2 if r2[1] not in [r1[0] for r1 in t1]]
res= [r1+r2 for r1 in t1 for r2 in rowmatches(t2,1,r1[0])]+\
     [r1+[None]*len(t2[0]) for r1 in t1 if not rowmatches(t2,1,r1[0])]+\
     [[None]*len(t1[0])+r2 for r2 in t2 if not rowmatches(t1,0,r2[1])]
Matthew Cremeens 4 years, 9 months ago  # | flag

I found this extremely valuable and wanted to thank you for it.

Created by Attila Vásárhelyi on Wed, 30 Oct 2002 (PSF)
Python recipes (4591)
Attila Vásárhelyi's recipes (2)

Required Modules

  • (none specified)

Other Information and Tasks