REVERSIBLE MARKOV DECISION PROCESSES WITH AN AVERAGE-REWARD CRITERION

Cogill R; Peng C

SIAM Journal on Control and Optimization, Vol.51, No.1, 402-418, 2013

A wide variety of stochastic models have an important property known as reversibility. The analysis of reversible Markov chains is often significantly simpler than analysis of general Markov chains, particularly since there are often simple closed-form expressions for their invariant probability mass functions. In this paper we study the structure of optimal control policies for Markov decision processes with reversible dynamics. For optimal control of Markov decision processes, significant analytical and computational simplifications can arise as a result of reversibility. Here we present an analysis of a special class of reversible Markov decision processes, namely those that contain a cycle through their communication graph under any policy. These results provide encouraging evidence that similar simplifications hold for more general models, such as reversible processes without cycles and partition reversible processes. These results are demonstrated on a control-based variant of the Metropolis-Hastings algorithm, yielding faster convergence and better asymptotic performance than a conventional algorithm when applied to a combinatorial optimization problem.

Keywords:Markov decision processes;reversible Markov chains;Metropolis-Hastings algorithm