Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems

Wang, Zimeng; 王子濛

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Mathematics: Theses

postgraduate thesis: Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems

Title	Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems
Authors	Wang, Zimeng 王子濛
Advisors	Advisor(s):Yuan, X
Issue Date	2025
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wang, Z. [王子濛]. (2025). Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	This thesis addresses a class of large-scale nonsmooth optimization problems prevalent in machine learning, where the objective function comprises a convex and smooth func tion (expressed as the average of a large number of differentiable function components) and a convex but potentially nonsmooth regularizer. Traditional first-order and deterministic methods for solving such problems often suffer from slow convergence or high computational costs. To address these limitations, we propose stochastic proximal Newton-type methods that incorporate both gradient and Hessian information to generate more accurate search directions by (approximately) solving some subproblems. We first introduce a general algo rithmic framework for approximate stochastic proximal Newton-type methods, independent of specific gradient and Hessian approximation schemes. Within this framework, we estab lish global and local convergence guarantees based on expected error bounds for gradient and Hessian approximations, as well as the accuracy of subproblem resolutions. Building on this framework, we develop three practical algorithms with tailored gradient and Hessian approximation schemes and stopping criteria. The first algorithm is an inexact subsampled proximal Newton method, where gradient and Hessian approximations are computed via data subsampling. To improve computational efficiency, we introduce an adaptive stop ping criterion for subproblem resolutions that controls violations from first-order optimality conditions. We prove that this method achieves globally linear and locally superlinear con vergence with appropriately increasing sample sizes. The second algorithm is a single-loop stochastic proximal L-BFGS method, which accelerates convergence by incorporating vari ance reduction (VR) techniques. Assuming exact subproblem resolutions, we employ a loopless SVRG method for stochastic gradient estimation and a stochastic limited-memory BFGS (L-BFGS) scheme for Hessian approximations, resulting in a simple single-loop struc ture. We extend this paradigm to accommodate a broader class of VR methods (e.g., SAGA and SEGA) and establish globally linear convergence under mild assumptions. The third algorithm is an inexact stochastic proximal Newton-type method that uses the stochastic variance-reduced gradient (SVRG) technique for efficient gradient estimation. This method allows flexible Hessian approximations, requiring only uniform boundedness from above and below. We develop two practical stopping criteria for subproblem resolutions, tailored for smooth and nonsmooth objectives respectively. In both smooth and nonsmooth settings, we show that this method converges linearly for strongly convex objectives and sublinearly for nonconvex objectives under mild assumptions. Last but not least, we meticulously analyze the nonsmooth subproblems arising in these stochastic proximal Newton-type methods and propose a highly efficient and easily implementable semismooth Newton (SSN) solver. By leveraging a compact representation of the L-BFGS matrix and storing auxiliary matrices, we show that the arithmetic operations per iteration scale merely linearly with the problem dimension. The effectiveness and numerical efficiency of the proposed algorithms, along with the fast SSN solver, are validated through extensive numerical experiments, demonstrating superior performance compared to several state-of-the-art methods.
Degree	Doctor of Philosophy
Subject	Nonsmooth optimization Convex functions Newton-Raphson method
Dept/Program	Mathematics
Persistent Identifier	http://hdl.handle.net/10722/367488

DC Field	Value	Language
dc.contributor.advisor	Yuan, X	-
dc.contributor.author	Wang, Zimeng	-
dc.contributor.author	王子濛	-
dc.date.accessioned	2025-12-11T06:42:25Z	-
dc.date.available	2025-12-11T06:42:25Z	-
dc.date.issued	2025	-
dc.identifier.citation	Wang, Z. [王子濛]. (2025). Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/367488	-
dc.description.abstract	This thesis addresses a class of large-scale nonsmooth optimization problems prevalent in machine learning, where the objective function comprises a convex and smooth func tion (expressed as the average of a large number of differentiable function components) and a convex but potentially nonsmooth regularizer. Traditional first-order and deterministic methods for solving such problems often suffer from slow convergence or high computational costs. To address these limitations, we propose stochastic proximal Newton-type methods that incorporate both gradient and Hessian information to generate more accurate search directions by (approximately) solving some subproblems. We first introduce a general algo rithmic framework for approximate stochastic proximal Newton-type methods, independent of specific gradient and Hessian approximation schemes. Within this framework, we estab lish global and local convergence guarantees based on expected error bounds for gradient and Hessian approximations, as well as the accuracy of subproblem resolutions. Building on this framework, we develop three practical algorithms with tailored gradient and Hessian approximation schemes and stopping criteria. The first algorithm is an inexact subsampled proximal Newton method, where gradient and Hessian approximations are computed via data subsampling. To improve computational efficiency, we introduce an adaptive stop ping criterion for subproblem resolutions that controls violations from first-order optimality conditions. We prove that this method achieves globally linear and locally superlinear con vergence with appropriately increasing sample sizes. The second algorithm is a single-loop stochastic proximal L-BFGS method, which accelerates convergence by incorporating vari ance reduction (VR) techniques. Assuming exact subproblem resolutions, we employ a loopless SVRG method for stochastic gradient estimation and a stochastic limited-memory BFGS (L-BFGS) scheme for Hessian approximations, resulting in a simple single-loop struc ture. We extend this paradigm to accommodate a broader class of VR methods (e.g., SAGA and SEGA) and establish globally linear convergence under mild assumptions. The third algorithm is an inexact stochastic proximal Newton-type method that uses the stochastic variance-reduced gradient (SVRG) technique for efficient gradient estimation. This method allows flexible Hessian approximations, requiring only uniform boundedness from above and below. We develop two practical stopping criteria for subproblem resolutions, tailored for smooth and nonsmooth objectives respectively. In both smooth and nonsmooth settings, we show that this method converges linearly for strongly convex objectives and sublinearly for nonconvex objectives under mild assumptions. Last but not least, we meticulously analyze the nonsmooth subproblems arising in these stochastic proximal Newton-type methods and propose a highly efficient and easily implementable semismooth Newton (SSN) solver. By leveraging a compact representation of the L-BFGS matrix and storing auxiliary matrices, we show that the arithmetic operations per iteration scale merely linearly with the problem dimension. The effectiveness and numerical efficiency of the proposed algorithms, along with the fast SSN solver, are validated through extensive numerical experiments, demonstrating superior performance compared to several state-of-the-art methods.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Nonsmooth optimization	-
dc.subject.lcsh	Convex functions	-
dc.subject.lcsh	Newton-Raphson method	-
dc.title	Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Mathematics	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2025	-
dc.identifier.mmsid	991045147153203414	-

File Download

Supplementary

postgraduate thesis: Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats