PS+: A simple yet effective framework for fast training on parameter server

Jin, A-Long; Xu, Wenchao; Guo, Song; Hu, Bing; Yeung, Kwan L

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPDS.2022.3200518
Scopus: eid_2-s2.0-85137595435
WOS: WOS:000849254300003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles
- Faculty of Engineering: Journal/Magazine Articles

Article: PS+: A simple yet effective framework for fast training on parameter server

Title	PS+: A simple yet effective framework for fast training on parameter server
Authors	Jin, A-Long Xu, Wenchao Guo, Song Hu, Bing Yeung, Kwan L
Keywords	Computational efficiency Computational modeling Data models Distributed training Hardware machine learning parameter server Servers Synchronization Training
Issue Date	1-Dec-2022
Publisher	Institute of Electrical and Electronics Engineers
Citation	IEEE Transactions on Parallel and Distributed Systems, 2022, v. 33, n. 12, p. 4625-4637 How to Cite? DOI: http://dx.doi.org/10.1109/TPDS.2022.3200518
Abstract	In distributed training, workers collaboratively refine the global model parameters by pushing their updates to the Parameter Server and pulling fresher parameters for the next iteration. This introduces high communication costs for training at scale, and incurs unproductive waiting time for workers. To minimize the waiting time, existing approaches overlap communication and computation for deep neural networks. Yet, these techniques not only require the layer-by-layer model structures, but also need significant efforts in runtime profiling and hyperparameter tuning. To make the overlapping optimization simple and generic , in this article, we propose a new Parameter Server framework. Our solution decouples the dependency between push and pull operations, and allows workers to eagerly pull the global parameters. This way, both push and pull operations can be easily overlapped with computations. Besides, the overlapping manner offers a different way to address the straggler problem, where the stale updates greatly retard the training process. In the new framework, with adequate information available to workers, they can explicitly modulate the learning rates for their updates. Thus, the global parameters can be less compromised by stale updates. We implement a prototype system in PyTorch and demonstrate its effectiveness on both CPU/GPU clusters. Experimental results show that our prototype saves up to 54% less time for each iteration and up to 37% fewer iterations for model convergence, achieving up to 2.86× speedup over widely-used synchronization schemes.
Persistent Identifier	http://hdl.handle.net/10722/339781
ISSN	1045-9219 2023 Impact Factor: 5.6 2023 SCImago Journal Rankings: 2.340
ISI Accession Number ID	WOS:000849254300003

DC Field	Value	Language
dc.contributor.author	Jin, A-Long	-
dc.contributor.author	Xu, Wenchao	-
dc.contributor.author	Guo, Song	-
dc.contributor.author	Hu, Bing	-
dc.contributor.author	Yeung, Kwan L	-
dc.date.accessioned	2024-03-11T10:39:16Z	-
dc.date.available	2024-03-11T10:39:16Z	-
dc.date.issued	2022-12-01	-
dc.identifier.citation	IEEE Transactions on Parallel and Distributed Systems, 2022, v. 33, n. 12, p. 4625-4637	-
dc.identifier.issn	1045-9219	-
dc.identifier.uri	http://hdl.handle.net/10722/339781	-
dc.description.abstract	<p>In distributed training, workers collaboratively refine the global model parameters by pushing their updates to the Parameter Server and pulling fresher parameters for the next iteration. This introduces high communication costs for training at scale, and incurs unproductive waiting time for workers. To minimize the waiting time, existing approaches overlap communication and computation for deep neural networks. Yet, these techniques not only require the layer-by-layer model structures, but also need significant efforts in runtime profiling and hyperparameter tuning. To make the overlapping optimization simple and generic , in this article, we propose a new Parameter Server framework. Our solution decouples the dependency between push and pull operations, and allows workers to eagerly pull the global parameters. This way, both push and pull operations can be easily overlapped with computations. Besides, the overlapping manner offers a different way to address the straggler problem, where the stale updates greatly retard the training process. In the new framework, with adequate information available to workers, they can explicitly modulate the learning rates for their updates. Thus, the global parameters can be less compromised by stale updates. We implement a prototype system in PyTorch and demonstrate its effectiveness on both CPU/GPU clusters. Experimental results show that our prototype saves up to 54% less time for each iteration and up to 37% fewer iterations for model convergence, achieving up to 2.86× speedup over widely-used synchronization schemes.<br></p>	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.relation.ispartof	IEEE Transactions on Parallel and Distributed Systems	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Computational efficiency	-
dc.subject	Computational modeling	-
dc.subject	Data models	-
dc.subject	Distributed training	-
dc.subject	Hardware	-
dc.subject	machine learning	-
dc.subject	parameter server	-
dc.subject	Servers	-
dc.subject	Synchronization	-
dc.subject	Training	-
dc.title	PS+: A simple yet effective framework for fast training on parameter server	-
dc.type	Article	-
dc.identifier.doi	10.1109/TPDS.2022.3200518	-
dc.identifier.scopus	eid_2-s2.0-85137595435	-
dc.identifier.volume	33	-
dc.identifier.issue	12	-
dc.identifier.spage	4625	-
dc.identifier.epage	4637	-
dc.identifier.eissn	1558-2183	-
dc.identifier.isi	WOS:000849254300003	-
dc.identifier.issnl	1045-9219	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: PS+: A simple yet effective framework for fast training on parameter server

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats