Journal of Physical Chemistry B, Vol.124, No.22, 4436-4445, 2020
RNAPosers: Machine Learning Classifiers for Ribonucleic Acid-Ligand Poses
Determining the three-dimensional (3D) structures of ribonucleic acid (RNA)-small molecule ligand complexes is critical to understanding molecular recognition in RNA. Computer docking can, in principle, be used to predict the 3D structure of RNA-small molecule complexes. Unfortunately, retrospective analysis has shown that the scoring functions that are typically used for pose prediction tend to misclassify nonnative poses as native and vice versa. Here, we use machine learning to train a set of pose classifiers that estimate the relative "nativeness" of a set of RNA-ligand poses. At the heart of our approach is the use of a pose "fingerprint" (FP) that is a composite of a set of atomic FPs, which individually encode the local "RNA environment" around ligand atoms. We found that by ranking poses based on classification scores from our machine learning classifiers, we were able to recover native-like poses better than when we ranked poses based on their docking scores. With a leave-one-out training and testing approach, we found that one of our classifiers could recover poses that were within 2.5 angstrom of the native poses in similar to 80% of the 80 cases we examined, and, on two separate validation sets, we could recover such poses in similar to 60% of the cases. Our set of classifiers, which we refer to as RNAPosers, should find utility as a tool to aid in RNA-ligand pose prediction, and so we make RNAPosers open to the academic community via https://github.com/atfrank/RNAPosers.