Approximate Value Iteration for Risk-Aware Markov Decision Processes

Yu PQ; Haskell WB; Xu H

IEEE Transactions on Automatic Control, Vol.63, No.9, 3135-3142, 2018

DOI10.1109/TAC.2018.2790261 Export Citation

Approximate Value Iteration for Risk-Aware Markov Decision Processes

We consider large-scale Markov decision processes (MDPs) with a time-consistent risk measure of variability in cost under the risk-aware MDP paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be solved using dynamic programming for small-to medium-sized problems. However, due to the "curse of dimensionality," MDPs that model real-life problems are typically prohibitively large for such approaches. In this technical note, we employ an approximate dynamic programming approach and develop a family of simulation-based algorithms to approximately solve large-scale risk-aware MDPs with time-consistent risk measures. In parallel, we develop a unified convergence analysis technique to derive sample complexity bounds for this new family of algorithms.

Keywords:Approximation algorithms;function approximation;Markov processes;risk measures