화학공학소재연구정보센터
Chemical Engineering Research & Design, Vol.87, No.10A, 1420-1429, 2009
Robust QSAR model development in high-throughput catalyst discovery based on genetic parameter optimisation
High-throughput strategies are gaining importance in catalyst formulation and discovery. The increased experimental capacity produces valuable data from which quantitative structure-activity relationship (QSAR) models can be developed to link catalyst composition and structure with the final performance. Various QSAR modelling algorithms are available, however, they are generally configuirable and their performance is highly dependent on the correct choice of parameters. With the proliferation and increasing sophistication of integrated data-mining tools, there is a need for systematic, robust, and generic parameter optimisation methods. This paper investigates a genetic algorithm (GA) for parameter optimisation of several QSAR methods for classification and regression: including feed-forward neural networks, decision tree generators, and support vector machines, with cross-validation providing the performance estimate. The methods were applied to four datasets, including three datasets from recent reports of high-throughput studies and one from our own laboratory. The results confirm that parameter optimisation is a critical step in QSAR modelling, and demonstrate the effectiveness of the GA approach. The best results were shared among the modelling methods, emphasising the importance of considering more than one type of model. (C) 2009 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.