File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: PS+: A simple yet effective framework for fast training on parameter server

TitlePS+: A simple yet effective framework for fast training on parameter server
Authors
KeywordsComputational efficiency
Computational modeling
Data models
Distributed training
Hardware
machine learning
parameter server
Servers
Synchronization
Training
Issue Date1-Dec-2022
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Parallel and Distributed Systems, 2022, v. 33, n. 12, p. 4625-4637 How to Cite?
Abstract

In distributed training, workers collaboratively refine the global model parameters by pushing their updates to the Parameter Server and pulling fresher parameters for the next iteration. This introduces high communication costs for training at scale, and incurs unproductive waiting time for workers. To minimize the waiting time, existing approaches overlap communication and computation for deep neural networks. Yet, these techniques not only require the layer-by-layer model structures, but also need significant efforts in runtime profiling and hyperparameter tuning. To make the overlapping optimization simple and generic , in this article, we propose a new Parameter Server framework. Our solution decouples the dependency between push and pull operations, and allows workers to eagerly pull the global parameters. This way, both push and pull operations can be easily overlapped with computations. Besides, the overlapping manner offers a different way to address the straggler problem, where the stale updates greatly retard the training process. In the new framework, with adequate information available to workers, they can explicitly modulate the learning rates for their updates. Thus, the global parameters can be less compromised by stale updates. We implement a prototype system in PyTorch and demonstrate its effectiveness on both CPU/GPU clusters. Experimental results show that our prototype saves up to 54% less time for each iteration and up to 37% fewer iterations for model convergence, achieving up to 2.86× speedup over widely-used synchronization schemes.


Persistent Identifierhttp://hdl.handle.net/10722/339781
ISSN
2023 Impact Factor: 5.6
2023 SCImago Journal Rankings: 2.340
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorJin, A-Long-
dc.contributor.authorXu, Wenchao-
dc.contributor.authorGuo, Song-
dc.contributor.authorHu, Bing-
dc.contributor.authorYeung, Kwan L-
dc.date.accessioned2024-03-11T10:39:16Z-
dc.date.available2024-03-11T10:39:16Z-
dc.date.issued2022-12-01-
dc.identifier.citationIEEE Transactions on Parallel and Distributed Systems, 2022, v. 33, n. 12, p. 4625-4637-
dc.identifier.issn1045-9219-
dc.identifier.urihttp://hdl.handle.net/10722/339781-
dc.description.abstract<p>In distributed training, workers collaboratively refine the global model parameters by pushing their updates to the Parameter Server and pulling fresher parameters for the next iteration. This introduces high communication costs for training at scale, and incurs unproductive waiting time for workers. To minimize the waiting time, existing approaches overlap communication and computation for deep neural networks. Yet, these techniques not only require the layer-by-layer model structures, but also need significant efforts in runtime profiling and hyperparameter tuning. To make the overlapping optimization simple and generic , in this article, we propose a new Parameter Server framework. Our solution decouples the dependency between push and pull operations, and allows workers to eagerly pull the global parameters. This way, both push and pull operations can be easily overlapped with computations. Besides, the overlapping manner offers a different way to address the straggler problem, where the stale updates greatly retard the training process. In the new framework, with adequate information available to workers, they can explicitly modulate the learning rates for their updates. Thus, the global parameters can be less compromised by stale updates. We implement a prototype system in PyTorch and demonstrate its effectiveness on both CPU/GPU clusters. Experimental results show that our prototype saves up to 54% less time for each iteration and up to 37% fewer iterations for model convergence, achieving up to 2.86× speedup over widely-used synchronization schemes.<br></p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Parallel and Distributed Systems-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectComputational efficiency-
dc.subjectComputational modeling-
dc.subjectData models-
dc.subjectDistributed training-
dc.subjectHardware-
dc.subjectmachine learning-
dc.subjectparameter server-
dc.subjectServers-
dc.subjectSynchronization-
dc.subjectTraining-
dc.titlePS+: A simple yet effective framework for fast training on parameter server-
dc.typeArticle-
dc.identifier.doi10.1109/TPDS.2022.3200518-
dc.identifier.scopuseid_2-s2.0-85137595435-
dc.identifier.volume33-
dc.identifier.issue12-
dc.identifier.spage4625-
dc.identifier.epage4637-
dc.identifier.eissn1558-2183-
dc.identifier.isiWOS:000849254300003-
dc.identifier.issnl1045-9219-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats