Industrial & Engineering Chemistry Research, Vol.58, No.8, 3082-3092, 2019
Machine Learning Derived Quantitative Structure Property Relationship (QSPR) to Predict Drug Solubility in Binary Solvent Systems
Prediction of drug solubility is a crucial problem in pharmaceutical industries for both drug delivery and discovery purposes. Several theoretical approaches have been proposed to predict drug solubility in mixed solvent systems when the solubility values in pure solvents are known. Quantitative structure property relationship (QSPR) approaches are gaining attention to predict various physical properties due to their robustness and computational tractability. In this work, a machine learning based QSPR approach is proposed to predict drug solubility in binary solvent systems using structural features, such as molar refractivity, McGowan volume, topological surface area, and so forth. A genetic algorithm based feature selection procedure is used to check the dependency between the selected features and to obtain the final set of significant features. Initially, solubility is assumed to behave linearly with respect to the structural features and model parameters are estimated using ordinary least-squares and a weight-based optimization approach. Later, solubility is assumed to be piecewise linear with respect to structural features and multiple model (MM) parameters are identified using a machine learning approach, which is a prediction error based clustering approach. The efficacy of proposed approaches is demonstrated on drug solubility data collected from literature. To compare the efficiency of the proposed MM approach, a neural network based nonlinear model with different configurations using a Levenberg-Marquardt training algorithm has been tested. A novel testing strategy is also proposed to identify a suitable model for a test sample when model parameters are obtained using a prediction error based clustering approach.