IEEE Transactions on Automatic Control, Vol.53, No.9, 2112-2116, 2008
Robust Optimality for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Transition Matrices
We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain transition matrices in the deterministic policy space. The transition matrices are classified as either independent or correlated. A generalized robust optimality criterion which can be degenerated to some popular optimality criteria is proposed, under which an optimal or near-optimal policy exists for any uncertain transition matrix. Theorems are developed to guarantee a stationary policy being optimal or near-optimal in the deterministic policy space.