Journal of the American Chemical Society, Vol.121, No.39, 9008-9012, 1999
R-factor, free R, and complete cross-validation for dipolar coupling refinement of NMR structures
NMR structure determination of macromolecules involves a minimization problem in which atomic models, subject to restraints relating to idealized covalent geometry and nonbonded contacts, are fitted to experimental observables. The latter comprise restraints between atoms separated by less than or equal to 6 Angstrom, such as NOE-derived interproton distances, torsion angles, coupling constants, and chemical shifts, as well as restraints that provide direct information on long-range order such as dipolar couplings. An expression for the dipolar coupling R-factor is derived which provides a quantitative and readily interpretable measure of the agreement between observed and calculated dipolar couplings. The dipolar R-factor expresses the ratio of the observed rms difference between observed and calculated values with that expected for a totally random distribution of vectors. The latter can be calculated exactly from the magnitude of the alignment tensor. The dipolar R-factor scales between 0 and 1, where a value of 0 indicates perfect agreement between observed and calculated dipolar couplings, and a completely random structure yields a value of 1. The dipolar coupling R-factor is readily amenable to complete cross-validation, with multiple pairs of working and test data sets, thereby permitting one to assess the quality of the fit to the experimental dipolar couplings and to avoid overfitting the experimental data. The application of the dipolar R-factor with complete cross-validation is demonstrated using experimental data for the protein cyanovirin-N.