Chenru Duan
machine learning for
quantum chemistry and materials discovery
Chenru Duan
Ph.D. Candidate
Chemistry, Chemical Engineering
MIT
Phone:
8579284855
Email:
Address:
25 Ames Street
Cambridge, MA, 02142
About Me
Hello! My name is Chenru Duan and I hold a Bachelor's Degree in Physics at Zhejiang University and I am currently a Ph.D. candidate of Chemistry at MIT.
My current research interest is integrating machine learning models in quantum chemistry calculations to achieve autonomous workflow for computational high throughput screening and materials discovery. I have demonstrated this workflow on accelerating the chemical discovery of functional materials and molecules, such as redox couples in redox flow batteries, catalysts for methanetomethanol conversion, and transition metal chromophores.
Back in my undergrad, I worked on problems in condensed matter physics, specifically quantum dynamics for open quantum systems. I improved the numerical feasibility of applying hierarchical equation of motion on lowtemperature spinboson model and investigated the quantum phase transition and novel heat transport properties therein.
In my spare time, I enjoy playing and electronic games, watching Japanese and Chinese anime, and hiking.
Honors and Awards
2022 Excellence Award for Graduate Student, ACS Chemical Computing Group
MolSSI Software Fellow ($50,000 to support molecular science software development in MolSSI)
2021 Best Poster Award, International Symposium on Machine Learning in Quantum Chemistry
Gold Award, MRS Graduate Student Award
Graduate Student Award, AIChE’s Computational Molecular Science and Engineering Forum
Educations
Ph.D. in Chemistry, MIT, Cambridge, MA
Doctoral advisor: Prof. Heather J. Kulik, GPA: 4.8/5.0, Hindex: 14, Citations: 694 2017  expected Nov. 2022
B.S. in Physics, Chu KoChen Honors college, Zhejiang University, Hangzhou, China
Honored degree, GPA: 3.92/4.00 (Overall), 3.95/4.00 (Major) 2013  2017
Research Experience and Skills
Sept. 2017  Present
July 2017  Sept. 2017
July 2015  June 2017
Department of Chemistry, MIT, Cambridge, MA
Graduate Research Assistant; Advisor: Prof. Heather J. Kulik

Developed the first set of machine learning classifiers that monitor quantum chemistry calculations on the fly in computational high throughput screening, saving more than half of the computational resources and time that would have beed wasted on failed calculations

Developed the first semisupervised learning classifier to identify strong static correlation in materials, achieving stateoftheart for this classification task

Integrated transfer learning and uncertainty quantification in computational high throughput screening, reducing the error of machine learning accelerated chemical discovery to 1 kcal/mol chemical accuracy

Discovered functional materials with multiobjective active learning, such as redox couples in redox flow battery, singlesite catalysts for methanetomethanol conversion, and robust transition metal chromophores

Developed proficiency with programming languages (Python and C), high performance computing, machine learning packages (Pytorch, Tensorflow, and PyG), quantum chemistry packages (TeraChem, Psi4, ORCA, and QChem) and software for working efficiency (Jupyter, Plotly, Docker, Colab, etc.)

Published over 20 papers in peerreviewed journals (ten firstauthored); Received five prestigious awards from five international professional associations; Gave 13 formal presentations at conferences (three invited)
SMART, National University of Singapore, Singapore
Research Engineer; Advisor: Prof. Jianshu Cao

Uncovered novel heat transport behaviors in noncommutative quantum heat engine with heatflux extended hierarchical equation of motion

Developed proficiency with Fortran, Matplotlib, and LaTex

Published two papers in peerreviewed journals (one first authored) and gave two departmental presentations
Department of Physics, Zhejiang University, Hangzhou, China
Undergraduate Research Assistant; Advisor: Prof. Jianlan Wu

Enabled numericalexact calculations of open quantum dynamics via extending the domain of applicability of hierarchical equation of motion, and studied the quantum phase transition of the spinboson model

Developed proficiency with Bash, Matlab, Mathematica, and OriginLab

Published three papers in peerreviewed journals (two first authored) and defended one bachelor thesis
Machine learning models monitor thousands of calculations on the fly: Saving half of the computational resources and time on catalysts design
J. Chem. Theory Comput. 15, 4, 2331–2345 (2019)
J. Phys. Chem. Lett. 12, 19, 4628–4637 (2021)
More than half of the computational resource and time is wasted on unfruitful geometry optimizations during the discovery of functional transitional metal complexes (TMCs). I built the first set of classifiers to predict the likelihood of calculation success: 1. prior to calculations as a zerocost model that rapidly filters out candidate calculations most likely to fail, and 2. during calculations that onthefly monitors and terminates an already running calculation if it is predicted to fail with high confidence. Combining with model uncertainty quantification control, this latter dynamic model always stays accurate (i.e. >95%) during the whole process of geometry optimization, saving more than half of the computation resources with compensation that very few (1%) good calculations would be falsely terminated. In addition, this dynamic model is extremely transferable since we use the DFTlevel information generated during a calculation as inputs. More importantly, the prediction of the dynamic classifier is interpretable and helps reveal the failure modes in geometry optimizations.
Besides proofofprinciple works, I have also demonstrated the usefulness of these classifiers on the active learning discovery of candidate catalysts for small alkane activation. With this set of classifiers for predicting the outcomes of QC calculations, firstprinciples data generation is greatly accelerated by at least two folds.
Decision engine
Smart quantum chemistry method selections aided by machine learning
J. Chem. Theory Comput. 16, 7, 4373–4387 (2020)
J. Phys. Chem. Lett. 11, 16, 6640–6648 (2020)
Many experts have devised multireference (MR) diagnostics to detect when correlated wavefunction theory (WFT) is needed over DFT, but these tools are impractical in largescale materials discovery, because they either request expensive calculations or the diagnostics themselves may behave unpredictably in new materials spaces. I first bridged the gap of the costaccuracy tradeoff between DFT and WFTbased MR diagnostics by building regression models to accurately predict those WFTbased diagnostics using DFTbased diagnostics and the 3D geometry of a system as inputs. In addition, utilizing the underlying consensus among multiple MR diagnostics,
I developed a semisupervised learning model that classifies based on both the labeled points that contain extremely strong/weak MR diagnostics and the underlying distributions of the MR diagnostics of all points. This model far outperforms the existing unsupervised learning (i.e., clustering) methods and conventional cutoffbased approach widely used in the chemistry community in distinguishing systems that contain strong MR character. This model is readily transferable to larger systems with unseen chemical compositions. Combining with our previously built regression models, one can obtain faithful MR character classification at a low cost of DFT. This set of ML models pave the way for quickly identifying the “DFTsafe” island6 during materials discovery and subsequently performing highercost correlated WFT calculations only for systems that are promising but difficult for DFT.
Materials discovery with higher efficacy and accuracy
J. Am. Chem. Soc., 143, 42, 17535–17547 (2021)
J. Phys. Chem. Lett., 11, 19, 8067–8076 (2020)
ACS Cent. Sci., 6, 413–524 (2020)
With improved autonomous workflow for computational high throughput screening enabled by machine learning "decision engines" that I built, we can obtain quantum chemistry data sets with much more ease and higher fidelity,
This autonomous workflow can then be coupled with established materials discovery strategies, such as active learning, efficient global optimization, and uncertainty quantification. This way, we discovered promising materials candidates as redox couples in redox flow batteries, catalysts for methanetomethanol conversion, and transition metal chromophores.
Harnessing the consensus of density functional approximations
Chem. Sci, 12, 39, 1302113036 (2021)
During the chemical discovery, people usually stick to a single choice of density functional approximation (DFA) in QC data generation because of the simplicity of the approach. However, this approach may introduce systematic bias to the dataset, especially for challenging materials space.
By investigating a large TMC dataset with DFAs at different rungs of the “Jacob’s ladder”, I found that good linear correlations exist among properties obtained by different DFAs, although their absolute predictions differ7. Therefore, ML can be used to reveal “universal” design rules in variance to the DFA choices. I found that lead compounds can be significantly dependent on DFA choices, demonstrating the risks of identifying lead compounds relying on a single choice of DFA. To alleviate the risks, I thus first proposed an approach that utilizes the consensus among multiple DFAs to discover robust (i.e., DFAinsensitive) lead compounds. These lead compounds discovered based on the DFA consensus are in much better agreement with experimentally observed leads compared to those identified by a single DFA.
Quantum dynamics to study quantum phase transition and heat transport
J. Phys. Chem. Lett., 11, 10, 4080–4085 (2020)
J. Chem. Phys. 147, 164112 (2017)
Phys. Rev. B 95, 214308 (2017)
Understanding nonequilibrium transport is crucial for controlling energy flow in nanoscale systems. We study thermal energy transfer in a generalized nonequilibrium spinboson model (NESB) with noncommutative system–bath coupling operators and discover its unusual transport properties. Compared to the conventional NESB, the energy current is greatly enhanced by rotating the system–bath coupling operators. Constructive contribution to thermal rectification can be optimized when two sources of asymmetry, system–bath coupling strength and coupling operators, coexist. At the weak coupling and the adiabatic limit, the scaling dependence of energy current on the coupling strength and the system energy gap changes drastically when the coupling operators become noncommutative. These novel transport properties, arising from the pure quantum effect of noncommutative coupling operators, suggest an unvisited dimension of controlling transport in nanoscale systems and should generally appear in other nonequilibrium setups and driven systems.