|
Description:
Computes the strategy oddments for two-player zero-sum games of perfect information. Uses a robust, iterative approximation that can handle dominance, non-square payoff matrices, and games without a saddle-point.
Source: Text Source
''' Approximate the strategy oddments for 2 person zero-sum games of perfect information.
Applies the iterative solution method described by J.D. Williams in his classic
book, The Compleat Strategyst, ISBN 0-486-25101-2. See chapter 5, page 180 for details. '''
from operator import add, neg
def solve(payoff_matrix, iterations=100):
'Return the oddments (mixed strategy ratios) for a given payoff matrix'
transpose = zip(*payoff_matrix)
numrows = len(payoff_matrix)
numcols = len(transpose)
row_cum_payoff = [0] * numrows
col_cum_payoff = [0] * numcols
colpos = range(numcols)
rowpos = map(neg, xrange(numrows))
colcnt = [0] * numcols
rowcnt = [0] * numrows
active = 0
for i in xrange(iterations):
rowcnt[active] += 1
col_cum_payoff = map(add, payoff_matrix[active], col_cum_payoff)
active = min(zip(col_cum_payoff, colpos))[1]
colcnt[active] += 1
row_cum_payoff = map(add, transpose[active], row_cum_payoff)
active = -max(zip(row_cum_payoff, rowpos))[1]
value_of_game = (max(row_cum_payoff) + min(col_cum_payoff)) / 2.0 / iterations
return rowcnt, colcnt, value_of_game
print solve([[2,3,1,4], [1,2,5,4], [2,3,4,1], [4,2,2,2]])
print solve([[4,0,2], [6,7,1]])
Discussion:
This is a general purpose tool for solving all of the games in the J.D. Williams book.
The approach is to gather strategy selection statistics from a succession of plays where each player makes the best countermove based upon the opponent's cumulative history of plays.
The output is the strategy oddment (the relative ratio that a player should play each strategy). For instance, given the payoff matrix:
A B C
_ _ _
D | 4 0 2
E | 6 7 1
The [75, 25] return value means that strategy D should be played 75% of the time and that E should be played 25% of the time. Likewise, the opponent's oddment of [0, 13, 87] means that the opponent should never play A and should play B and C in about a 1:7 ratio (approximately). The value of the game is about 1.765.
Given an m-by-n matrix, the running time is O((m+n)*iterations). So, this function performs respectably even with very large payoff matrices.
|