IEEE Transactions on Automatic Control, Vol.59, No.9, 2574-2579, 2014
Risk-Constrained Markov Decision Processes
We propose a new constrained Markov decision process framework with risk-type constraints. The risk metric we use is Conditional Value-at-Risk (CVaR), which is gaining popularity in finance. It is a conditional expectation but the conditioning is defined in terms of the level of the tail probability. We propose an iterative offline algorithm to find the risk-contrained optimal control policy. A two time-scale stochastic approximation-inspired 'learning' variant is also sketched, and its convergence proved to the optimal risk-constrained policy.