화학공학소재연구정보센터
IEEE Transactions on Automatic Control, Vol.58, No.1, 32-46, 2013
Dynamic Optimization and Learning for Renewal Systems
This paper considers optimization of time averages in systems with variable length renewal frames. Applications include power-aware and profit-aware scheduling in wireless networks, peer-to-peer networks, and transportation systems. Every frame, a new policy is implemented that affects the frame size and that creates a vector of attributes. The policy can be a single decision in response to a random event observed on the frame, or a sequence of such decisions. The goal is to choose policies on each frame in order to maximize a time average of one attribute, subject to additional time average constraints on the others. Two algorithms are developed, both based on Lyapunov optimization concepts. The first makes decisions to minimize a "drift-plus-penalty" ratio over each frame. The second is similar but does not involve a ratio. For systems that make only a single decision on each frame, both algorithms are shown to learn efficient behavior without a-priori statistical knowledge. The framework is also applicable to large classes of constrained Markov decision problems. Such problems are reduced to finding an approximate solution to a simpler unconstrained stochastic shortest path problem on each frame. Approximations for the simpler problem may still suffer from a curse of dimensionality for systems with large state space. However, our results are compatible with any approximation method, and demonstrate an explicit tradeoff between performance and convergence time.