IEEE Transactions on Automatic Control, Vol.55, No.1, 201-207, 2010
Approximating Ergodic Average Reward Continuous-Time Controlled Markov Chains
We study the approximation of an ergodic average reward continuous-time denumerable state Markov decision process (MDP) by means of a sequence of MDPs. Our results include the convergence of the corresponding optimal policies and the optimal gains. For a controlled upwardly skip-free process, we show some computational results to illustrate the convergence theorems.
Keywords:Approximation of control problems;Ergodic Markov decision processes (MDPs);policy iteration algorithm