IEEE Transactions on Automatic Control, Vol.60, No.4, 1164-1168, 2015
Folding Algorithm for Policy Evaluation for Markov Decision Processes With Quasi-Birth Death Structure
This technical note presents a new numerical procedure for policy evaluation of Stochastic Shortest Path Markov Decision Processes (MDPs) having a level independent Quasi Birth-Death structure. The algorithm is derived using a method analogous to the folding method of Ye and Li (1994). The computational complexity is O(M(3)log(2)N) + O((MN)-N-2), where the process has N levels and M phases. A simple example involving the control of two queues is presented to illustrate the application of this efficient policy evaluation algorithm to compare and rank control policies.