Journal of the American Chemical Society, Vol.139, No.49, 17870-17881, 2017
Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC
A method to predict the crystal structure of equiatomic ternary compositions based only on the constituent elements was developed using cluster resolution feature selection (CR-FS) and support vector machine (SVM) classification. The supervised machine-learning model was first trained with 1037 individual compounds that adopt the most populated ternary 1:1:1 structure types (TiNiSi-, ZrNiAI-, PbFC1-, LiGaGe-, YPtAs-, UGeTe-, and LaPtSi-type) and then validated using an additional 519 compounds. The CR-FS algorithm improves class discrimination and indicates that 113 variables including size, electronegativity, number of valence electrons, and position on the periodic table (group number) influence the structure preference. The final model prediction sensitivity, specificity, and accuracy were 97.3%, 93.9%, and 96.9%, respectively, establishing that this method is capable of reliably predicting the crystal structure given only its composition. The power of CR-FS and SVM classification is further demonstrated by segregating the crystal structure of polymorphs, specifically to examine polymorphism in TiNiSi- and ZrNiAl-type structures. Analyzing 19 compositions that are experimentally reported in both structure types, this machine-learning model correctly identifies, with high confidence (> 0.7), the low-temperature polymorph from its high-temperature form. Interestingly, machine learning also reveals that certain compositions cannot be clearly differentiated and lie in a "confused" region (0.3-0.7 confidence), suggesting that both polymorphs may be observed in a single sample at certain experimental conditions. The ensuing synthesis and characterization of TiFeP adopting both TiNiSi- and ZrNiAl-type structures in a single sample, even after long annealing times (3 months), validate the occurrence of the region of structural uncertainty predicted by machine learning.