Journal of Physical Chemistry B, Vol.108, No.43, 16950-16959, 2004
Optimization of the UNRES force field by hierarchical design of the potential-energy landscape. 3. Use of many proteins in optimization
We report the application of the hierarchical optimization method of protein potential-energy landscapes described in the accompanying papers (Liwo, A.; Arlukowicz, P.; Oldziej, S.; Czaplewski, C.; Makowski, M.; Scheraga, H. A. J. Phys. Chem. B 2004, 108, 16918; Oldziej, S.; Liwo, A.; Czaplewski, C.; Pillardy, J.; Scheraga, H. A. J. Phys. Chem. B 2004, 108, 16934) to optimize the UNRES potential energy function using two [1E0G (alpha + beta) and 1E0L (beta)], three [1E0G, 1E0L, and 1GAB (alpha)], and, finally, four training proteins [1E0G, 1E0L, 1GAB, and 1IGD (a + P)] simultaneously; these training sets and the resulting force fields are referred to as 2P, 3P, and 4P, respectively. The hierarchies of 1E0L and 1GAB were determined following the procedure applied to 1E0G described in an accompanying paper, 2 the hierarchies of 1IGD and 1E0G being taken from experiment and from the accompanying paper 2 respectively. For all training sets, optimization was successful; in other words, (i) the target function composed of contributions from each set of training proteins could be optimized and (ii) the resulting force fields located the nativelike structures of each of the training proteins as the lowest energy by a global conformational search, which means that hierarchical optimization with multiple training proteins is feasible. Subsequently, the 3P and 4P force fields were tested on a set of 66 proteins (26 alpha-, 15 beta-, and 25 alpha + beta-proteins with chain length from 28 to 144 amino acid residues). Both force fields perform comparably on the a proteins, but the 4P force field performs definitely better on the alpha + beta- and beta-proteins. With the 4P force field, the average length of a continuous segment matching the corresponding segment of the experimental structure within 6 A rmsd and the percentage of correctly predicted chain length are 54 (67%), 34 (45%), 42 (55%), and 45 (58%) for the alpha-, beta-, alpha + beta-, and all proteins, respectively, and the length of the longest predicted continuous fragment is 96, 49, and 70 residues for the alpha-, beta-, and alpha + beta-proteins, respectively; with the 3P force field, the longest predicted fragment within a 6-Angstrom rmsd cutoff was 127 residues (for the 144-residue 1LPE alpha-protein). These results are a major step forward with respect to our earlier attempts at optimizing the UNRES force field by maximizing the energy gap and Z score between the nativelike structures and the lowest-energy non-native structure where, for feasibility, we earlier had to derive a separate force field for each structural class, and the predictive power of the force field derived to treat a-proteins was much greater than those derived for the alpha + beta and the beta structural classes. However, the 4P force field definitely performs better on the alpha- and alpha + beta- than on the beta-proteins, which strongly suggests that further improvements are needed, the most significant issue being to differentiate the conformations within structural levels depending on their nativelikeness, not only those that belong to different structural levels.