초록 |
Bayesian optimization(BO) is a sequential decision-making strategy for efficiently finding a global optimum for black-box optimization problems. In chemical engineering, where data generation through experiments is expensive, decisions based on BO are crucial. However, widely used standard BOs are one-step optimal. This means conventional BO only considers the immediate improvement. However, none of the real-world problems can be solved in a single step of iteration. Therefore, a strategy of making decisions while looking N-step ahead is required. This work suggests Reinforcement learning(RL) based BO to achieve this goal. Sequential experiments can be viewed as stochastic dynamic programming(SDP) problems, and RL is the method to solve SDP in the most near-optimum way. In this research, the experiment horizon was assumed to be fixed and the efficiency of optimization algorithms were compared on benchmark functions. Once the number of experiments are assumed to be limited, the suggested RL based BO has a shrinking horizon for its lookahead decisions. It is revealed that suggested RL based BO has superior data efficiency compared to EI based BO and other conventional methods. |