Journal of Physical Chemistry B, Vol.123, No.16, 3462-3474, 2019
Sequence Effects on Size, Shape, and Structural Heterogeneity in Intrinsically Disordered Proteins
Intrinsically disordered proteins (IDPs) lack well-defined three-dimensional structures, thus challenging the archetypal notion of structure-function relationships. Determining the ensemble of conformations that IDPs explore under physiological conditions is the first step toward understanding their diverse cellular functions. Here, we quantitatively characterize the structural features of IDPs as a function of sequence and length using coarse-grained simulations. For diverse IDP sequences, with the number of residues (N-T) ranging from 20 to 441, our simulations not only reproduce the radii of gyration (R-g) obtained from experiments, but also predict the full scattering intensity profiles in excellent agreement with small-angle X-ray scattering experiments. The R-g values are well-described by the standard Flory scaling law, R-g = (RgNT nu)-N-0, with nu approximate to 0.588, making it tempting to assert that IDPs behave as polymers in a good solvent. However, clustering analysis reveals that the menagerie of structures explored by IDPs is diverse, with the extent of heterogeneity being highly sequence-dependent, even though ensemble-averaged properties, such as the dependence of R-g on chain length, may suggest synthetic polymer-like behavior in a good solvent. For example, we show that for the highly charged Prothymosin-alpha, a substantial fraction of conformations is highly compact. Even if the sequence compositions are similar, as is the case for alpha-Synuclein and a truncated construct from the Tau protein, there are substantial differences in the conformational heterogeneity. Taken together, these observations imply that metrics based on net charge or related quantities alone cannot be used to anticipate the phases of IDPs, either in isolation or in complex with partner IDPs or RNA. Our work sets the stage for probing the interactions of IDPs with each other, with folded protein domains, or with partner RNAs, which are critical for describing the structures of stress granules and biomolecular condensates with important cellular functions.