화학공학소재연구정보센터
Journal of the American Chemical Society, Vol.126, No.20, 6258-6273, 2004
Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments
The major rate-limiting step in high-throughput NMR protein structure determination involves the calculation of a reliable initial fold, the elimination of incorrect nuclear Overhauser enhancement (NOE) assignments, and the resolution of NOE assignment ambiguities. We present a robust approach to automatically calculate structures with a backbone coordinate accuracy of 1.0-1.5 Angstrom from datasets in which as much as 80% of the long-range NOE information (i.e., between residues separated by more than five positions in the sequence) is incorrect. The current algorithm differs from previously published methods in that it has been expressly designed to ensure that the results from successive cycles are not biased by the global fold of structures generated in preceding cycles. Consequently, the method is highly error tolerant and is not easily funnelled down an incorrect path in either three-dimensional structure or NOE assignment space. The algorithm incorporates three main features: a linear energy function representation of the NOE restraints to allow maximization of the number of simultaneously satisfied restraints during the course of simulated annealing; a method for handling the presence of multiple possible assignments for each NOE cross-peak which avoids local minima by treating each possible assignment as if it were an independent restraint; and a probabilistic method to permit both inactivation and reactivation of all NOE restraints on the fly during the course of simulated annealing. NOE restraints are never removed permanently, thereby significantly reducing the likelihood of becoming trapped in a false minimum of NOE assignment space. The effectiveness of the algorithm is demonstrated using completely automatically peak-picked experimental NOE data from two proteins: interleulkin-4 (136 residues) and cyanovirin-N (101 residues). The limits of the method are explored using simulated data on the 56-residue 131 domain of Streptococcal protein G.