Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs

Najim K; Poznyak AS; Gomez E

Automatica, Vol.37, No.7, 1007-1018, 2001

DOI10.1016/S0005-1098(01)00050-4 Export Citation

Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs

Najim K, Poznyak AS, Gomez E

A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of participants is the optimization of the limiting average payoff. The behaviour of each players is modelled by a finite controlled Markov chain. A novel adaptive policy based of Lagrange multipliers is developed. We introduce a regularized Lagrange function to guarantee the uniqueness of the corresponding saddle-point (equilibrium point) and a new normalization procedure participating in the adaptive strategy which asymptotically realizes this equilibrium. The saddle-point is shown to be unique. The convergence properties are stated and it is shown that this adaptive control algorithm has the order of convergence of magnitude (n(-1/3)).

Keywords:stochastic game;adaptive control;controlled Markov chains