File Download
Supplementary

postgraduate thesis: First-order methods with applications in machine learning problems

TitleFirst-order methods with applications in machine learning problems
Authors
Advisors
Advisor(s):Yuan, X
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, Z. [刘泽华]. (2024). First-order methods with applications in machine learning problems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe objective of this thesis is to advance numerical analysis by addressing contemporary challenges in training neural networks, which are emblematic of modern machine learning. Despite the remarkable success of machine learning methodologies in various applications like computer vision and natural language processing, training neural networks remains a formidable challenge due to their nonconvex nature and large-scale dimensions. This thesis focuses on developing tractable first-order algorithms to efficiently train neural networks, considering the challenges within the machine learning domain. It aims to propose novel numerical algorithms to address these practical impediments. The first part of the thesis concentrates on minimizing composite functions, particularly the function $F + R$, where $F$ is a smooth, weakly-convex function and $R$ is a nonsmooth, weakly-convex function. This problem is highly relevant in machine learning, especially in image restoration applications where both $F$ and $R$ are constructed using nonconvex neural networks. To tackle the lack of convexity in both $F$ and $R$, this thesis proposes a novel algorithm called the stochastic forward-backward splitting method (Stochastic FBSM) and discuss its convergence properties. The second part of the thesis focuses on minimax optimization, which presents theoretical and numerical challenges compared to standard optimization. Minimax optimization involves optimizing two competitive variables and finds applications in machine learning, including reinforcement learning and generative adversarial networks (GANs). This thesis introduces practical algorithms to address minimax optimization, starting with modifications to the gradient descent ascent method (GDA). Two frameworks, namely adaptive balancing and randomized variants of GDA (Randomized-GDA and Adaptive-RGDA), are proposed. Additionally, the combination of Polyak's momentum with GDA is considered. The thesis also explores logical training, incorporating logical information into neural network training to improve performance. The logical information is formulated as a minimax optimization problem, and a specialized algorithm is developed to solve this structured minimax problem. The final part of the thesis focuses on bilevel optimization (BO), specifically when both the upper and inner objective functions are generic nonconvex functions. To address this BO problem, a transformative approach is employed by converting it into a minimax problem and establishing their equivalence. This enables the proposal of a unified framework leveraging minimax theory to effectively solve the BO. The theoretical study of minimax optimization contributes to addressing the challenges posed by BO. In summary, this thesis contributes to the development of tractable first-order algorithms for solving generic nonconvex challenges in machine learning. The proposed algorithms are supported by rigorous theoretical analyses and demonstrated through extensive empirical evaluations, showcasing superior performance compared to existing methodologies in various machine learning applications, including computer vision and natural language processing. This research has the potential to inform and enrich optimization theories for contemporary learning paradigms.
DegreeDoctor of Philosophy
SubjectMachine learning
Dept/ProgramMathematics
Persistent Identifierhttp://hdl.handle.net/10722/350339

 

DC FieldValueLanguage
dc.contributor.advisorYuan, X-
dc.contributor.authorLiu, Zehua-
dc.contributor.author刘泽华-
dc.date.accessioned2024-10-23T09:46:19Z-
dc.date.available2024-10-23T09:46:19Z-
dc.date.issued2024-
dc.identifier.citationLiu, Z. [刘泽华]. (2024). First-order methods with applications in machine learning problems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/350339-
dc.description.abstractThe objective of this thesis is to advance numerical analysis by addressing contemporary challenges in training neural networks, which are emblematic of modern machine learning. Despite the remarkable success of machine learning methodologies in various applications like computer vision and natural language processing, training neural networks remains a formidable challenge due to their nonconvex nature and large-scale dimensions. This thesis focuses on developing tractable first-order algorithms to efficiently train neural networks, considering the challenges within the machine learning domain. It aims to propose novel numerical algorithms to address these practical impediments. The first part of the thesis concentrates on minimizing composite functions, particularly the function $F + R$, where $F$ is a smooth, weakly-convex function and $R$ is a nonsmooth, weakly-convex function. This problem is highly relevant in machine learning, especially in image restoration applications where both $F$ and $R$ are constructed using nonconvex neural networks. To tackle the lack of convexity in both $F$ and $R$, this thesis proposes a novel algorithm called the stochastic forward-backward splitting method (Stochastic FBSM) and discuss its convergence properties. The second part of the thesis focuses on minimax optimization, which presents theoretical and numerical challenges compared to standard optimization. Minimax optimization involves optimizing two competitive variables and finds applications in machine learning, including reinforcement learning and generative adversarial networks (GANs). This thesis introduces practical algorithms to address minimax optimization, starting with modifications to the gradient descent ascent method (GDA). Two frameworks, namely adaptive balancing and randomized variants of GDA (Randomized-GDA and Adaptive-RGDA), are proposed. Additionally, the combination of Polyak's momentum with GDA is considered. The thesis also explores logical training, incorporating logical information into neural network training to improve performance. The logical information is formulated as a minimax optimization problem, and a specialized algorithm is developed to solve this structured minimax problem. The final part of the thesis focuses on bilevel optimization (BO), specifically when both the upper and inner objective functions are generic nonconvex functions. To address this BO problem, a transformative approach is employed by converting it into a minimax problem and establishing their equivalence. This enables the proposal of a unified framework leveraging minimax theory to effectively solve the BO. The theoretical study of minimax optimization contributes to addressing the challenges posed by BO. In summary, this thesis contributes to the development of tractable first-order algorithms for solving generic nonconvex challenges in machine learning. The proposed algorithms are supported by rigorous theoretical analyses and demonstrated through extensive empirical evaluations, showcasing superior performance compared to existing methodologies in various machine learning applications, including computer vision and natural language processing. This research has the potential to inform and enrich optimization theories for contemporary learning paradigms.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMachine learning-
dc.titleFirst-order methods with applications in machine learning problems-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineMathematics-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044860752203414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats