machine learning for
quantum chemistry and materials discovery
Hello! My name is Chenru Duan and I hold a Bachelor's Degree in Physics at Zhejiang University and I am currently a Ph.D. candidate of Chemistry at MIT.
My current research interest is integrating machine learning models in quantum chemistry calculations to achieve autonomous workflow for computational high throughput screening and materials discovery. I have demonstrated this workflow on accelerating the chemical discovery of functional materials and molecules, such as redox couples in redox flow batteries, catalysts for methane-to-methanol conversion, and transition metal chromophores.
Back in my undergrad, I worked on problems in condensed matter physics, specifically quantum dynamics for open quantum systems. I improved the numerical feasibility of applying hierarchical equation of motion on low-temperature spin-boson model and investigated the quantum phase transition and novel heat transport properties therein.
In my spare time, I enjoy playing and electronic games, watching Japanese and Chinese anime, and hiking.
Honors and Awards
2022 Excellence Award for Graduate Student, ACS Chemical Computing Group
MolSSI Software Fellow ($50,000 to support molecular science software development in MolSSI)
2021 Best Poster Award, International Symposium on Machine Learning in Quantum Chemistry
Gold Award, MRS Graduate Student Award
Graduate Student Award, AIChE’s Computational Molecular Science and Engineering Forum
Ph.D. in Chemistry, MIT, Cambridge, MA
Doctoral advisor: Prof. Heather J. Kulik, GPA: 4.8/5.0, H-index: 14, Citations: 694 2017 - expected Nov. 2022
B.S. in Physics, Chu KoChen Honors college, Zhejiang University, Hangzhou, China
Honored degree, GPA: 3.92/4.00 (Overall), 3.95/4.00 (Major) 2013 - 2017
Research Experience and Skills
Sept. 2017 - Present
July 2017 - Sept. 2017
July 2015 - June 2017
Department of Chemistry, MIT, Cambridge, MA
Graduate Research Assistant; Advisor: Prof. Heather J. Kulik
Developed the first set of machine learning classifiers that monitor quantum chemistry calculations on the fly in computational high throughput screening, saving more than half of the computational resources and time that would have beed wasted on failed calculations
Developed the first semi-supervised learning classifier to identify strong static correlation in materials, achieving state-of-the-art for this classification task
Integrated transfer learning and uncertainty quantification in computational high throughput screening, reducing the error of machine learning accelerated chemical discovery to 1 kcal/mol chemical accuracy
Discovered functional materials with multi-objective active learning, such as redox couples in redox flow battery, single-site catalysts for methane-to-methanol conversion, and robust transition metal chromophores
Developed proficiency with programming languages (Python and C), high performance computing, machine learning packages (Pytorch, Tensorflow, and PyG), quantum chemistry packages (TeraChem, Psi4, ORCA, and QChem) and software for working efficiency (Jupyter, Plotly, Docker, Colab, etc.)
Published over 20 papers in peer-reviewed journals (ten first-authored); Received five prestigious awards from five international professional associations; Gave 13 formal presentations at conferences (three invited)
SMART, National University of Singapore, Singapore
Research Engineer; Advisor: Prof. Jianshu Cao
Uncovered novel heat transport behaviors in non-commutative quantum heat engine with heat-flux extended hierarchical equation of motion
Developed proficiency with Fortran, Matplotlib, and LaTex
Published two papers in peer-reviewed journals (one first authored) and gave two departmental presentations
Department of Physics, Zhejiang University, Hangzhou, China
Undergraduate Research Assistant; Advisor: Prof. Jianlan Wu
Enabled numerical-exact calculations of open quantum dynamics via extending the domain of applicability of hierarchical equation of motion, and studied the quantum phase transition of the spin-boson model
Developed proficiency with Bash, Matlab, Mathematica, and OriginLab
Published three papers in peer-reviewed journals (two first authored) and defended one bachelor thesis
Machine learning models monitor thousands of calculations on the fly: Saving half of the computational resources and time on catalysts design
J. Chem. Theory Comput. 15, 4, 2331–2345 (2019)
J. Phys. Chem. Lett. 12, 19, 4628–4637 (2021)
More than half of the computational resource and time is wasted on unfruitful geometry optimizations during the discovery of functional transitional metal complexes (TMCs). I built the first set of classifiers to predict the likelihood of calculation success: 1. prior to calculations as a zero-cost model that rapidly filters out candidate calculations most likely to fail, and 2. during calculations that on-the-fly monitors and terminates an already running calculation if it is predicted to fail with high confidence. Combining with model uncertainty quantification control, this latter dynamic model always stays accurate (i.e. >95%) during the whole process of geometry optimization, saving more than half of the computation resources with compensation that very few (1%) good calculations would be falsely terminated. In addition, this dynamic model is extremely transferable since we use the DFT-level information generated during a calculation as inputs. More importantly, the prediction of the dynamic classifier is interpretable and helps reveal the failure modes in geometry optimizations.
Besides proof-of-principle works, I have also demonstrated the usefulness of these classifiers on the active learning discovery of candidate catalysts for small alkane activation. With this set of classifiers for predicting the outcomes of QC calculations, first-principles data generation is greatly accelerated by at least two folds.
Smart quantum chemistry method selections aided by machine learning
J. Chem. Theory Comput. 16, 7, 4373–4387 (2020)
J. Phys. Chem. Lett. 11, 16, 6640–6648 (2020)
Many experts have devised multi-reference (MR) diagnostics to detect when correlated wavefunction theory (WFT) is needed over DFT, but these tools are impractical in large-scale materials discovery, because they either request expensive calculations or the diagnostics themselves may behave unpredictably in new materials spaces. I first bridged the gap of the cost-accuracy tradeoff between DFT- and WFT-based MR diagnostics by building regression models to accurately predict those WFT-based diagnostics using DFT-based diagnostics and the 3D geometry of a system as inputs. In addition, utilizing the underlying consensus among multiple MR diagnostics,
I developed a semi-supervised learning model that classifies based on both the labeled points that contain extremely strong/weak MR diagnostics and the underlying distributions of the MR diagnostics of all points. This model far outperforms the existing unsupervised learning (i.e., clustering) methods and conventional cutoff-based approach widely used in the chemistry community in distinguishing systems that contain strong MR character. This model is readily transferable to larger systems with unseen chemical compositions. Combining with our previously built regression models, one can obtain faithful MR character classification at a low cost of DFT. This set of ML models pave the way for quickly identifying the “DFT-safe” island6 during materials discovery and subsequently performing higher-cost correlated WFT calculations only for systems that are promising but difficult for DFT.
Materials discovery with higher efficacy and accuracy
J. Am. Chem. Soc., 143, 42, 17535–17547 (2021)
J. Phys. Chem. Lett., 11, 19, 8067–8076 (2020)
ACS Cent. Sci., 6, 413–524 (2020)
With improved autonomous workflow for computational high throughput screening enabled by machine learning "decision engines" that I built, we can obtain quantum chemistry data sets with much more ease and higher fidelity,
This autonomous workflow can then be coupled with established materials discovery strategies, such as active learning, efficient global optimization, and uncertainty quantification. This way, we discovered promising materials candidates as redox couples in redox flow batteries, catalysts for methane-to-methanol conversion, and transition metal chromophores.
Harnessing the consensus of density functional approximations
Chem. Sci, 12, 39, 13021-13036 (2021)
During the chemical discovery, people usually stick to a single choice of density functional approximation (DFA) in QC data generation because of the simplicity of the approach. However, this approach may introduce systematic bias to the dataset, especially for challenging materials space.
By investigating a large TMC dataset with DFAs at different rungs of the “Jacob’s ladder”, I found that good linear correlations exist among properties obtained by different DFAs, although their absolute predictions differ7. Therefore, ML can be used to reveal “universal” design rules in variance to the DFA choices. I found that lead compounds can be significantly dependent on DFA choices, demonstrating the risks of identifying lead compounds relying on a single choice of DFA. To alleviate the risks, I thus first proposed an approach that utilizes the consensus among multiple DFAs to discover robust (i.e., DFA-insensitive) lead compounds. These lead compounds discovered based on the DFA consensus are in much better agreement with experimentally observed leads compared to those identified by a single DFA.
Quantum dynamics to study quantum phase transition and heat transport
J. Phys. Chem. Lett., 11, 10, 4080–4085 (2020)
J. Chem. Phys. 147, 164112 (2017)
Phys. Rev. B 95, 214308 (2017)
Understanding non-equilibrium transport is crucial for controlling energy flow in nanoscale systems. We study thermal energy transfer in a generalized non-equilibrium spin-boson model (NESB) with non-commutative system–bath coupling operators and discover its unusual transport properties. Compared to the conventional NESB, the energy current is greatly enhanced by rotating the system–bath coupling operators. Constructive contribution to thermal rectification can be optimized when two sources of asymmetry, system–bath coupling strength and coupling operators, coexist. At the weak coupling and the adiabatic limit, the scaling dependence of energy current on the coupling strength and the system energy gap changes drastically when the coupling operators become non-commutative. These novel transport properties, arising from the pure quantum effect of non-commutative coupling operators, suggest an unvisited dimension of controlling transport in nanoscale systems and should generally appear in other non-equilibrium set-ups and driven systems.