Journal of Physical Chemistry A, Vol.120, No.51, 10231-10244, 2016
Massively Parallel Implementation of Explicitly Correlated Coupled-Cluster Singles and Doubles Using TiledArray Framework
A new distributed-memory massively parallel implementation of standard and explicitly correlated (F12) coupled-cluster singles and doubles (CCSD) with canonical O(N-6) computational complexity is described. The implementation is based on the TiledArray tensor framework. Novel features of the implementation include (a) all data greater than O(N) is distributed in memory and (b) the mixed use of density fitting and integral-driven formulations that optionally allows to avoid storage of tensors with three and four unoccupied indices. Excellent strong scaling is demonstrated on a multicore shared-memory computer, a commodity distributed-memory computer, and a national-scale super-computer. The performance on a shared-memory computer is competitive with the popular CCSD implementations in ORCA and Psi4. Moreover, the CCSD performance on a commodity-size cluster significantly improves on the state-of-the-art package NWChem. The large-scale parallel explicitly correlated coupled-cluster implementation makes routine accurate estimation of the coupled-cluster basis set limit for molecules with 20 or more atoms. Thus, it can provide valuable benchmarks for the merging reduced-scaling coupled-cluster approaches. The new implementation allowed us to revisit the basis set limit for the CCSD contribution to the binding energy of pi-stacked uracil dimer, a challenging paradigm of pi-stacking interactions from the S66 benchmark database. The revised value for the CCSD correlation binding energy obtained with the help of quadruple-zeta CCSD computations, -8.30 +/- 0.02 kcal/mol, is significantly different from the S66 reference value, -8.50 kcal/mol, as well as other CBS limit estimates in the recent literature.