File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems
| Title | Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems |
|---|---|
| Authors | |
| Advisors | Advisor(s):Yuan, X |
| Issue Date | 2025 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Wang, Z. [王子濛]. (2025). Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | This thesis addresses a class of large-scale nonsmooth optimization problems prevalent
in machine learning, where the objective function comprises a convex and smooth func
tion (expressed as the average of a large number of differentiable function components) and
a convex but potentially nonsmooth regularizer. Traditional first-order and deterministic
methods for solving such problems often suffer from slow convergence or high computational
costs. To address these limitations, we propose stochastic proximal Newton-type methods
that incorporate both gradient and Hessian information to generate more accurate search
directions by (approximately) solving some subproblems. We first introduce a general algo
rithmic framework for approximate stochastic proximal Newton-type methods, independent
of specific gradient and Hessian approximation schemes. Within this framework, we estab
lish global and local convergence guarantees based on expected error bounds for gradient
and Hessian approximations, as well as the accuracy of subproblem resolutions. Building
on this framework, we develop three practical algorithms with tailored gradient and Hessian
approximation schemes and stopping criteria. The first algorithm is an inexact subsampled
proximal Newton method, where gradient and Hessian approximations are computed via
data subsampling. To improve computational efficiency, we introduce an adaptive stop
ping criterion for subproblem resolutions that controls violations from first-order optimality
conditions. We prove that this method achieves globally linear and locally superlinear con
vergence with appropriately increasing sample sizes. The second algorithm is a single-loop
stochastic proximal L-BFGS method, which accelerates convergence by incorporating vari
ance reduction (VR) techniques. Assuming exact subproblem resolutions, we employ a
loopless SVRG method for stochastic gradient estimation and a stochastic limited-memory
BFGS (L-BFGS) scheme for Hessian approximations, resulting in a simple single-loop struc
ture. We extend this paradigm to accommodate a broader class of VR methods (e.g., SAGA
and SEGA) and establish globally linear convergence under mild assumptions. The third
algorithm is an inexact stochastic proximal Newton-type method that uses the stochastic
variance-reduced gradient (SVRG) technique for efficient gradient estimation. This method
allows flexible Hessian approximations, requiring only uniform boundedness from above and
below. We develop two practical stopping criteria for subproblem resolutions, tailored for
smooth and nonsmooth objectives respectively. In both smooth and nonsmooth settings, we
show that this method converges linearly for strongly convex objectives and sublinearly for
nonconvex objectives under mild assumptions. Last but not least, we meticulously analyze
the nonsmooth subproblems arising in these stochastic proximal Newton-type methods and
propose a highly efficient and easily implementable semismooth Newton (SSN) solver. By
leveraging a compact representation of the L-BFGS matrix and storing auxiliary matrices,
we show that the arithmetic operations per iteration scale merely linearly with the problem
dimension. The effectiveness and numerical efficiency of the proposed algorithms, along with
the fast SSN solver, are validated through extensive numerical experiments, demonstrating
superior performance compared to several state-of-the-art methods.
|
| Degree | Doctor of Philosophy |
| Subject | Nonsmooth optimization Convex functions Newton-Raphson method |
| Dept/Program | Mathematics |
| Persistent Identifier | http://hdl.handle.net/10722/367488 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Yuan, X | - |
| dc.contributor.author | Wang, Zimeng | - |
| dc.contributor.author | 王子濛 | - |
| dc.date.accessioned | 2025-12-11T06:42:25Z | - |
| dc.date.available | 2025-12-11T06:42:25Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Wang, Z. [王子濛]. (2025). Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/367488 | - |
| dc.description.abstract | This thesis addresses a class of large-scale nonsmooth optimization problems prevalent in machine learning, where the objective function comprises a convex and smooth func tion (expressed as the average of a large number of differentiable function components) and a convex but potentially nonsmooth regularizer. Traditional first-order and deterministic methods for solving such problems often suffer from slow convergence or high computational costs. To address these limitations, we propose stochastic proximal Newton-type methods that incorporate both gradient and Hessian information to generate more accurate search directions by (approximately) solving some subproblems. We first introduce a general algo rithmic framework for approximate stochastic proximal Newton-type methods, independent of specific gradient and Hessian approximation schemes. Within this framework, we estab lish global and local convergence guarantees based on expected error bounds for gradient and Hessian approximations, as well as the accuracy of subproblem resolutions. Building on this framework, we develop three practical algorithms with tailored gradient and Hessian approximation schemes and stopping criteria. The first algorithm is an inexact subsampled proximal Newton method, where gradient and Hessian approximations are computed via data subsampling. To improve computational efficiency, we introduce an adaptive stop ping criterion for subproblem resolutions that controls violations from first-order optimality conditions. We prove that this method achieves globally linear and locally superlinear con vergence with appropriately increasing sample sizes. The second algorithm is a single-loop stochastic proximal L-BFGS method, which accelerates convergence by incorporating vari ance reduction (VR) techniques. Assuming exact subproblem resolutions, we employ a loopless SVRG method for stochastic gradient estimation and a stochastic limited-memory BFGS (L-BFGS) scheme for Hessian approximations, resulting in a simple single-loop struc ture. We extend this paradigm to accommodate a broader class of VR methods (e.g., SAGA and SEGA) and establish globally linear convergence under mild assumptions. The third algorithm is an inexact stochastic proximal Newton-type method that uses the stochastic variance-reduced gradient (SVRG) technique for efficient gradient estimation. This method allows flexible Hessian approximations, requiring only uniform boundedness from above and below. We develop two practical stopping criteria for subproblem resolutions, tailored for smooth and nonsmooth objectives respectively. In both smooth and nonsmooth settings, we show that this method converges linearly for strongly convex objectives and sublinearly for nonconvex objectives under mild assumptions. Last but not least, we meticulously analyze the nonsmooth subproblems arising in these stochastic proximal Newton-type methods and propose a highly efficient and easily implementable semismooth Newton (SSN) solver. By leveraging a compact representation of the L-BFGS matrix and storing auxiliary matrices, we show that the arithmetic operations per iteration scale merely linearly with the problem dimension. The effectiveness and numerical efficiency of the proposed algorithms, along with the fast SSN solver, are validated through extensive numerical experiments, demonstrating superior performance compared to several state-of-the-art methods. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Nonsmooth optimization | - |
| dc.subject.lcsh | Convex functions | - |
| dc.subject.lcsh | Newton-Raphson method | - |
| dc.title | Some stochastic proximal Newton-type methods for solving large-scale nonsmooth convex optimization problems | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Mathematics | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991045147153203414 | - |
