Machine learning models monitors thousands of calculations on the fly: Saving half of the computational resources and time on catalysts design

J. Chem. Theory Comput. 15, 4, 2331–2345 (2019)

J. Phys. Chem. Lett. 12, 19, 4628–4637 (2021)

 More than half of the computational resource and time is wasted on unfruitful geometry optimizations during the discovery of functional transitional metal complexes (TMCs). I built the first set of classifiers to predict the likelihood of calculation success: 1. prior to calculations as a zero-cost model that rapidly filters out candidate calculations most likely to fail, and 2. during calculations that on-the-fly monitors and terminates an already running calculation if it is predicted to fail with high confidence. Combining with model uncertainty quantification control, this latter dynamic model always stays accurate (i.e. >95%) during the whole process of geometry optimization, saving more than half of the computation resources with compensation that very few (1%) good calculations would be falsely terminated. In addition, this dynamic model is extremely transferable since we use the DFT-level information generated during a calculation as inputs. More importantly, the prediction of the dynamic classifier is interpretable and helps reveal the failure modes in geometry optimizations.


Besides proof-of-principle works, I have also demonstrated the usefulness of these classifiers on the active learning discovery of candidate catalysts for small alkane activation. With this set of classifiers for predicting the outcomes of QC calculations, first-principles data generation is greatly accelerated by at least two folds.