IEEE Transactions on Automatic Control, Vol.64, No.5, 2045-2052, 2019
Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces
This paper considers the optimization problem for partially observable Markov decision processes (POMDPs) with the continuous state, observation, and action spaces. POMDPs with the discrete spaces have emerged as a promising approach to the decision systems with imperfect state information. However, in recent applications of POMDPs, there are many problems that have continuous states, observations, and actions. For such problems, due to the infinite dimensionality of the belief space, the existing studies usually discretize the continuous spaces with the sufficient or nonsufficient statistics, which may cause the curse of dimensionality and performance degradation. In this paper. based on the sensitivity analysis of the performance criteria, we have developed a simulation-based policy iteration algorithm to find the local optimal observation-based policy for POMDPs with the continuous spaces. The proposed algorithm needs none of the specific assumptions and prior information, and has a low computational complexity. One numerical example of the complicated multiple-input multiple-output beamforming problem shows that the algorithm has a significant performance improvement.
Keywords:Continuous spaces;none of the prior information;partially observable Markov decision process (POMDP);sensitivity analysis;simulation-based optimization