화학공학소재연구정보센터
IEEE Transactions on Automatic Control, Vol.49, No.6, 1022-1025, 2004
Dynamically identifying regenerative cycles in simulation-based optimization algorithms for Markov chains
Simulation-based algorithms for maximizing the average reward of a parameterized Markov chain often rely upon the existence of a state which is recurrent for all choices of parameter values. The question of which recurrent state should serve to mark the end of a regenerative cycle is a very important practical consideration in applications. Even when all of the states of the process are recurrent, some states tend to be visited more often than others, and lengthy renewal cycles tend to result in high variance estimates of the gradient. To address this difficulty, we analyze a mechanism for adjusting this special state dynamically (i*-adaptation) as applied to the "batch" simulation-based optimization algorithm of a previous paper. We show that the desirable convergence properties of the original "batch" algorithm are retained with i*-adaptation, namely the almost sure convergence of the parameter vector to a critical point.