File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Minibatch and local SGD: Algorithmic stability and linear speedup in generalization

TitleMinibatch and local SGD: Algorithmic stability and linear speedup in generalization
Authors
KeywordsAlgorithmic stability
Generalization analysis
Learning theory
Stochastic gradient descent
Issue Date17-Jul-2025
PublisherElsevier
Citation
Applied and Computational Harmonic Analysis, 2025, v. 79 How to Cite?
AbstractThe increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors in a multi-pass setting. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing an expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
Persistent Identifierhttp://hdl.handle.net/10722/366464
ISSN
2023 Impact Factor: 2.6
2023 SCImago Journal Rankings: 2.231

 

DC FieldValueLanguage
dc.contributor.authorLei, Yunwen-
dc.contributor.authorSun, Tao-
dc.contributor.authorLiu, Mingrui-
dc.date.accessioned2025-11-25T04:19:33Z-
dc.date.available2025-11-25T04:19:33Z-
dc.date.issued2025-07-17-
dc.identifier.citationApplied and Computational Harmonic Analysis, 2025, v. 79-
dc.identifier.issn1063-5203-
dc.identifier.urihttp://hdl.handle.net/10722/366464-
dc.description.abstractThe increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors in a multi-pass setting. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing an expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofApplied and Computational Harmonic Analysis-
dc.subjectAlgorithmic stability-
dc.subjectGeneralization analysis-
dc.subjectLearning theory-
dc.subjectStochastic gradient descent-
dc.titleMinibatch and local SGD: Algorithmic stability and linear speedup in generalization-
dc.typeArticle-
dc.identifier.doi10.1016/j.acha.2025.101795-
dc.identifier.scopuseid_2-s2.0-105010699042-
dc.identifier.volume79-
dc.identifier.eissn1096-603X-
dc.identifier.issnl1063-5203-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats