Industrial & Engineering Chemistry Research, Vol.46, No.3, 839-845, 2007
Mixed-integer linear programming algorithm for a computational protein design problem
The computational protein design problem or the side-chain positioning problem is a central part in computational methods for predicting protein structure and designing protein sequences. However, the computational protein design problem is NP-complete, and it is also NP-complete to find a reasonable approximate solution to this problem. Here, we present a critical finding that the network flow structure embedded in the integer linear programming formulation of the computational protein design problem makes it equivalent to a mixed-integer linear programming formulation with fewer binary variables. This novel formulation effectively reduces the combinatorial difficulty of the computational protein design problem, thereby allowing the sequence selection of the global minimum energy conformation for large proteins to become tractable by using the standard optimization algorithms. Our preliminary calculation results for 20 core redesigned proteins show that the mixed-integer linear programming algorithm with a validated physical model is faster than its integer linear programming counterpart by up to 3 times. More importantly, the specific mathematical structure underlying the mixed-integer linear programming formulation paves the way for developing even faster combinatorial searching strategies by using stronger cutting planes for mixed-integer programming.