IEEE Transactions on Automatic Control, Vol.64, No.10, 4137-4152, 2019
Thompson Sampling for Stochastic Control: The Continuous Parameter Case
Recently, Thompson sampling has been shown to achieve good theoretical performance guarantees for stochastic control problems with parameter uncertainty when the state, control, and parameter spaces are all finite. Much less is known however about the performance of Thompson sampling when applied to continuous or more general spaces, which constitutes an important class of problems in practice. In this paper, we study Thompson sampling when applied to a broad class of average cost stochastic control problems where the state, control, and parameter spaces are all general measurable spaces. The main contributions of our paper are establishing theoretical performance guarantees for Thompson sampling as measured by: first, expected posterior sampling error; and second, average per period regret.
Keywords:Average regret bounds;Bayesian learning;general parameter spaces;posterior convergence rate;Thompson sampling