STRONG UNIFORM VALUE IN GAMBLING HOUSES AND PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES

Venel X; Ziliotto B

SIAM Journal on Control and Optimization, Vol.54, No.4, 1983-2008, 2016

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any epsilon > 0, the decision maker has a pure strategy sigma which is epsilon optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any epsilon > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.

Keywords:dynamic programming;Markov decision processes;partial observation;uniform value;long-run average payoff