Journal of Physical Chemistry B, Vol.122, No.39, 9087-9101, 2018
Generating Intrinsically Disordered Protein Conformational Ensembles from a Database of Ramachandran Space Pair Residue Probabilities Using a Markov Chain
Intrinsically disordered proteins (IDPs), involved in regulatory pathways and cell signaling, sample a range of conformations. Constructing structural ensembles of IDPs is a difficult task for both experiment and simulation. In this work, we produce potential IDP ensembles using an existing database of pair residue phi and psi angle probabilities chosen from turn, coil, and bend parts of sequences from the Protein Data Bank. For all residue pair types, a k-means-based discretization is used to create a set of rotamers and their probabilities in this pair Ramachandran space. For a given sequence, a Markov-based probabilistic algorithm is used to create Ramachandran space database-Markov ensembles that are converted to Cartesian coordinates of the backbone atoms. From these Cartesian coordinates and phi and psi dihedral angles of a sequence, various observables: the radius of gyration and shape parameters, the distance probability distribution that is related to the small-angle X-ray scattering intensity, atom-atom contact percentages, local structural information, NMR three- bond J couplings, CA chemical shifts, and residual dipolar couplings are evaluated. A benchmark set of ensembles for 16 residue, regular sequences is constructed and used to validate the method and to explore the implications of the database for some of the above-mentioned observables. Then, we examine a set of nonapeptides of the form EGAAXAASS where X denotes residues of different characters. These peptides were studied by NMR, and subsequent molecular dynamics (MD) simulations were carried out using various force fields to find which one best agrees with the NMR data. Our analysis of these peptides shows that the combination of the database and the Markov algorithm yields ensembles that agree very well with the NMR and MD results for the above-listed observables. Thus, this database-Markov method is a promising method to generate IDP conformational ensembles.