SIAM Journal on Control and Optimization, Vol.36, No.4, 1331-1347, 1998
Simplifying optimal strategies in stochastic games
We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of stationary epsilon-optimal strategies, for all epsilon > 0, and the existence of Markov optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of stationary optimal strategies is not implied by the existence of optimal strategies, so the result is sharp.More generally, one can evaluate a strategy pi for the maximizing player, player 1, by the reward phi(s)(pi) that pi guarantees to him when starting in state s. A strategy pi is called nonimproving if phi(s)(pi) greater than or equal to phi(s)(pi[h]) for all s and for all finite histories h with final state s, where pi[h] is the strategy pi conditional on the history h. Using the evaluation phi, we may define the relation "epsilon-better" between strategies. A strategy pi(1) is called epsilon-better than pi(2) if phi(s)(pi(1)) greater than or equal to phi(s)(pi(2)) - epsilon for all s. We show that for any nonimproving strategy pi, for all epsilon > 0, there exists an epsilon-better stationary strategy and a (0-)better Markov strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies.Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by "almost stationary" epsilon-optimal strategies, for all epsilon > 0, and by "almost Markov" optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.