화학공학소재연구정보센터
SIAM Journal on Control and Optimization, Vol.51, No.1, 402-418, 2013
REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION
A wide variety of stochastic models have an important property known as reversibility. The analysis of reversible Markov chains is often significantly simpler than analysis of general Markov chains, particularly since there are often simple closed-form expressions for their invariant probability mass functions. In this paper we study the structure of optimal control policies for Markov decision processes with reversible dynamics. For optimal control of Markov decision processes, significant analytical and computational simplifications can arise as a result of reversibility. Here we present an analysis of a special class of reversible Markov decision processes, namely those that contain a cycle through their communication graph under any policy. These results provide encouraging evidence that similar simplifications hold for more general models, such as reversible processes without cycles and partition reversible processes. These results are demonstrated on a control-based variant of the Metropolis-Hastings algorithm, yielding faster convergence and better asymptotic performance than a conventional algorithm when applied to a combinatorial optimization problem.