File Download
Supplementary

postgraduate thesis: Machine learning based approach on accurate prediction of thermodynamic properties

TitleMachine learning based approach on accurate prediction of thermodynamic properties
Authors
Advisors
Advisor(s):Chen, GChe, CM
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Yang, G. [楊冠雅]. (2020). Machine learning based approach on accurate prediction of thermodynamic properties. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractWith the surging in studies of machine learning, particularly, deep learning, various of sophisticated fundamental approaches of applying machine learning in quantum chemistry are demonstrated. Successful attempts have reflected the immense potential of machine learning methods in the improvement of first-principle methods, including direct prediction of molecular properties, accelerating solving Schrodinger equation, improving Hamiltonian and upgrading the simulation accuracy. Machine learning based quantum chemistry studies shows the evidence that one can obtain fast and accurate results uninvolved high-cost computations. However, limitations such as shortages of sufficient high quality theoretical or experimental training data results in the trend that a majority of deep learning research focusing on predicting low-level targeted properties yet fails to improve the accuracy upon such level. With a good balance in accuracy-cost trade-off, density functional theory (DFT) based calculation using a hybrid functional B3LYP dominates in estimations regarding thermodynamic properties, and far reaching chemical accuracy. As approaching chemical accuracy is the ultimate goal for every computational chemist, despite the success of DFT, development of machine learning based methods with high accuracy but low-cost is desired. In this thesis, a multiple linear regression-Bayesian neural network model is built to calibrate heat of formation at B3LYP level against experimental result. The complex systematic error occurring in DFT calculation is studied and corrected to the level of chemical accuracy by the multi-step model. Particularly, the proposed method indicates strong capability on extrapolation in larger molecules despite the fact that the model is constructed based on the information from smaller molecules only. The curse of high-dimensionality of chemical space remains to be one of the most visible and challenging issue in machine learning based quantum chemistry research. Instead of performing complex feature engineering on molecular representations, the born nature of molecules with graph-like geometries provides strong inductive bias to implement graphic neural network and directly operates on graph structured representations of molecules. To further bypass first-principle calculation, a deep neural network is trained with molecular graph structured data to accurately predict the thermodynamic properties, heat of formation, with DFT level accuracy. Followed by the deep training process, previous knowledge in correcting the systematic error in B3LYP calculation is adopted to further minimize the difference between the predicted heat of formation and the benchmark from experimental results. A sophisticate method combines message passing neural network and Bayesian neural network is proposed to accurately and effectively capture the complex relationship between simple geometric information and experimental energy with a limited size of dataset.
DegreeDoctor of Philosophy
SubjectThermochemistry
Quantum chemistry
Machine learning
Dept/ProgramChemistry
Persistent Identifierhttp://hdl.handle.net/10722/298881

 

DC FieldValueLanguage
dc.contributor.advisorChen, G-
dc.contributor.advisorChe, CM-
dc.contributor.authorYang, Guanya-
dc.contributor.author楊冠雅-
dc.date.accessioned2021-04-16T11:16:37Z-
dc.date.available2021-04-16T11:16:37Z-
dc.date.issued2020-
dc.identifier.citationYang, G. [楊冠雅]. (2020). Machine learning based approach on accurate prediction of thermodynamic properties. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/298881-
dc.description.abstractWith the surging in studies of machine learning, particularly, deep learning, various of sophisticated fundamental approaches of applying machine learning in quantum chemistry are demonstrated. Successful attempts have reflected the immense potential of machine learning methods in the improvement of first-principle methods, including direct prediction of molecular properties, accelerating solving Schrodinger equation, improving Hamiltonian and upgrading the simulation accuracy. Machine learning based quantum chemistry studies shows the evidence that one can obtain fast and accurate results uninvolved high-cost computations. However, limitations such as shortages of sufficient high quality theoretical or experimental training data results in the trend that a majority of deep learning research focusing on predicting low-level targeted properties yet fails to improve the accuracy upon such level. With a good balance in accuracy-cost trade-off, density functional theory (DFT) based calculation using a hybrid functional B3LYP dominates in estimations regarding thermodynamic properties, and far reaching chemical accuracy. As approaching chemical accuracy is the ultimate goal for every computational chemist, despite the success of DFT, development of machine learning based methods with high accuracy but low-cost is desired. In this thesis, a multiple linear regression-Bayesian neural network model is built to calibrate heat of formation at B3LYP level against experimental result. The complex systematic error occurring in DFT calculation is studied and corrected to the level of chemical accuracy by the multi-step model. Particularly, the proposed method indicates strong capability on extrapolation in larger molecules despite the fact that the model is constructed based on the information from smaller molecules only. The curse of high-dimensionality of chemical space remains to be one of the most visible and challenging issue in machine learning based quantum chemistry research. Instead of performing complex feature engineering on molecular representations, the born nature of molecules with graph-like geometries provides strong inductive bias to implement graphic neural network and directly operates on graph structured representations of molecules. To further bypass first-principle calculation, a deep neural network is trained with molecular graph structured data to accurately predict the thermodynamic properties, heat of formation, with DFT level accuracy. Followed by the deep training process, previous knowledge in correcting the systematic error in B3LYP calculation is adopted to further minimize the difference between the predicted heat of formation and the benchmark from experimental results. A sophisticate method combines message passing neural network and Bayesian neural network is proposed to accurately and effectively capture the complex relationship between simple geometric information and experimental energy with a limited size of dataset.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshThermochemistry-
dc.subject.lcshQuantum chemistry-
dc.subject.lcshMachine learning-
dc.titleMachine learning based approach on accurate prediction of thermodynamic properties-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineChemistry-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2021-
dc.identifier.mmsid991044360595603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats