Chemical Engineering Science, Vol.66, No.19, 4356-4369, 2011
Enhanced inter-helical residue contact prediction in transmembrane proteins
In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set (McAllister and Floudas, 2008), we have enhanced this method by(1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction,(2) enhancing the mathematical model via modifications of several important physical constraints and (3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs etal. (2009). The blind contact prediction scheme has been tested on two different membrane protein sets. First, it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Second, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit Lo etal., 2009) and it is shown that it exhibits better prediction accuracy. (C) 2011 Elsevier Ltd. All rights reserved.
Keywords:Residue contact prediction;Membrane proteins;Mixed integer linear optimization;Protein structure prediction;Data mining;Mathematical Modeling