Computes the strategy oddments for two-player zero-sum games of perfect information. Uses a robust, iterative approximation that can handle dominance, non-square payoff matrices, and games without a saddle-point.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ''' Approximate the strategy oddments for 2 person zero-sum games of perfect information.
Applies the iterative solution method described by J.D. Williams in his classic
book, The Compleat Strategyst, ISBN 0-486-25101-2. See chapter 5, page 180 for details. '''
from operator import add, neg
def solve(payoff_matrix, iterations=100):
'Return the oddments (mixed strategy ratios) for a given payoff matrix'
transpose = zip(*payoff_matrix)
numrows = len(payoff_matrix)
numcols = len(transpose)
row_cum_payoff = [0] * numrows
col_cum_payoff = [0] * numcols
colpos = range(numcols)
rowpos = map(neg, xrange(numrows))
colcnt = [0] * numcols
rowcnt = [0] * numrows
active = 0
for i in xrange(iterations):
rowcnt[active] += 1
col_cum_payoff = map(add, payoff_matrix[active], col_cum_payoff)
active = min(zip(col_cum_payoff, colpos))[1]
colcnt[active] += 1
row_cum_payoff = map(add, transpose[active], row_cum_payoff)
active = -max(zip(row_cum_payoff, rowpos))[1]
value_of_game = (max(row_cum_payoff) + min(col_cum_payoff)) / 2.0 / iterations
return rowcnt, colcnt, value_of_game
###########################################
# Example solutions to two pay-off matrices
print solve([[2,3,1,4], [1,2,5,4], [2,3,4,1], [4,2,2,2]]) # Example on page 185
print solve([[4,0,2], [6,7,1]]) # Exercise 2 number 3
|
This is a general purpose tool for solving all of the games in the J.D. Williams book.
The approach is to gather strategy selection statistics from a succession of plays where each player makes the best countermove based upon the opponent's cumulative history of plays.
The output is the strategy oddment (the relative ratio that a player should play each strategy). For instance, given the payoff matrix: <pre> A B C _ _ _ D | 4 0 2 E | 6 7 1 </pre>
The [75, 25] return value means that strategy D should be played 75% of the time and that E should be played 25% of the time. Likewise, the opponent's oddment of [0, 13, 87] means that the opponent should never play A and should play B and C in about a 1:7 ratio (approximately). The value of the game is about 1.765.
Given an m-by-n matrix, the running time is O((m+n)*iterations). So, this function performs respectably even with very large payoff matrices.
on running the code, there is an error in python 3.4 "object of type 'zip' has no len()" can you please check for that regards, Ashish