화학공학소재연구정보센터
Automatica, Vol.37, No.7, 1007-1018, 2001
Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs
A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of participants is the optimization of the limiting average payoff. The behaviour of each players is modelled by a finite controlled Markov chain. A novel adaptive policy based of Lagrange multipliers is developed. We introduce a regularized Lagrange function to guarantee the uniqueness of the corresponding saddle-point (equilibrium point) and a new normalization procedure participating in the adaptive strategy which asymptotically realizes this equilibrium. The saddle-point is shown to be unique. The convergence properties are stated and it is shown that this adaptive control algorithm has the order of convergence of magnitude (n(-1/3)).