화학공학소재연구정보센터
SIAM Journal on Control and Optimization, Vol.43, No.6, 2109-2131, 2005
Algorithms for countable state Markov decision models with an absorbing set
We consider a countable state Markov decision process with bounded reward function and an absorbing set. At first we generalize known properties and derive new properties of the critical discount factor, which is roughly defined as the maximal discount factor under which for V, the maximal expected infinite-stage discounted reward, there is guaranteed existence, boundedness, and computability by the successive approximation method. The emphasis of the paper is on algorithms for computing V exactly (recursion in state space and policy iteration) or approximately ( value iteration combined with an extrapolation and finite state approximation). Our extrapolation method is motivated by and based on the Perron-Frobenius theory for nonlinear operators. As a by-product we obtain an efficient algorithm for determining the distribution of the entrance time of a Markov chain into an absorbing set. Further results concern asymptotically epsilon-optimal policies and a new turnpike theorem. Some of the results need tightness of the transition law, which turns out to be equivalent to compactness of a nonlinear operator, which is crucial for our study.