화학공학소재연구정보센터
IEEE Transactions on Automatic Control, Vol.60, No.11, 2989-2993, 2015
Classification-Based Approximate Policy Iteration
Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms, and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results.