Sequential Decision Making With Coherent Risk

Tamar A; Chow Y; Ghavamzadeh M; Mannor S

IEEE Transactions on Automatic Control, Vol.62, No.7, 3323-3338, 2017

DOI10.1109/TAC.2016.2644871 Export Citation

Sequential Decision Making With Coherent Risk

Tamar A, Chow Y, Ghavamzadeh M, Mannor S

We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as conditional value at risk and mean-semi-deviation. Our approach is suitable for problems in which tuneable parameters control the distribution of the cost, such as in reinforcement learning or approximate dynamic programming with a parameterized policy. Such problems cannot be solved using previous approaches. We consider both static risk measures and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient methods, while for the dynamic risk measures, we use actor-critic type algorithms.

Keywords:Coherent risk;dynamic programming;Markov decision processes;policy gradient