WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel

Yang, Duo; Hu, Bing; Liu, An; Jin, A. Long; Yeung, Kwan L.; You, Yang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.parco.2024.103092
Scopus: eid_2-s2.0-85198006976
WOS: WOS:001267107700001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles
- Faculty of Engineering: Journal/Magazine Articles

Article: WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel

Title	WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel
Authors	Yang, Duo Hu, Bing Liu, An Jin, A. Long Yeung, Kwan L.You, Yang
Keywords	Distributed machine learning Heterogeneous environment Parameter server Stragglers Synchronous parallel
Issue Date	1-Sep-2024
Publisher	Elsevier
Citation	Parallel Computing: Systems & Applications, 2024, v. 121 How to Cite? DOI: http://dx.doi.org/10.1016/j.parco.2024.103092
Abstract	Parameter server is widely used in distributed machine learning to accelerate training. However, the increasing heterogeneity of workers’ computing capabilities leads to the issue of stragglers, making parameter synchronization challenging. To address this issue, we propose a solution called Worker-Busy Synchronous Parallel (WBSP). This approach eliminates the waiting time of fast workers during the synchronization process and decouples the gradient upload and model download of fast workers into asymmetric parts. By doing so, it allows fast workers to complete multiple steps of local training and upload more gradients to the server, improving computational resource utilization. Additionally, the global model is only updated when the slowest worker uploads the gradients, ensuring the consistency of global models that are pulled down by all workers and the convergence of the global model. Building upon WBSP, we propose an optimized version to further reduce the communication overhead. It enables parallel execution of communication and computation tasks on workers to shorten the global synchronization interval, thereby improving training speed. We conduct theoretical analyses for the proposed mechanisms. Extensive experiments verify that our mechanism can reduce the required time to achieve the target accuracy by up to 60% compared with the fastest method and increase the proportion of computation time from 55%–72% in existing methods to 91%.
Persistent Identifier	http://hdl.handle.net/10722/351119
ISSN	0167-8191 2023 Impact Factor: 2.0 2023 SCImago Journal Rankings: 0.460
ISI Accession Number ID	WOS:001267107700001

DC Field	Value	Language
dc.contributor.author	Yang, Duo	-
dc.contributor.author	Hu, Bing	-
dc.contributor.author	Liu, An	-
dc.contributor.author	Jin, A. Long	-
dc.contributor.author	Yeung, Kwan L.	-
dc.contributor.author	You, Yang	-
dc.date.accessioned	2024-11-10T00:30:15Z	-
dc.date.available	2024-11-10T00:30:15Z	-
dc.date.issued	2024-09-01	-
dc.identifier.citation	Parallel Computing: Systems & Applications, 2024, v. 121	-
dc.identifier.issn	0167-8191	-
dc.identifier.uri	http://hdl.handle.net/10722/351119	-
dc.description.abstract	Parameter server is widely used in distributed machine learning to accelerate training. However, the increasing heterogeneity of workers’ computing capabilities leads to the issue of stragglers, making parameter synchronization challenging. To address this issue, we propose a solution called Worker-Busy Synchronous Parallel (WBSP). This approach eliminates the waiting time of fast workers during the synchronization process and decouples the gradient upload and model download of fast workers into asymmetric parts. By doing so, it allows fast workers to complete multiple steps of local training and upload more gradients to the server, improving computational resource utilization. Additionally, the global model is only updated when the slowest worker uploads the gradients, ensuring the consistency of global models that are pulled down by all workers and the convergence of the global model. Building upon WBSP, we propose an optimized version to further reduce the communication overhead. It enables parallel execution of communication and computation tasks on workers to shorten the global synchronization interval, thereby improving training speed. We conduct theoretical analyses for the proposed mechanisms. Extensive experiments verify that our mechanism can reduce the required time to achieve the target accuracy by up to 60% compared with the fastest method and increase the proportion of computation time from 55%–72% in existing methods to 91%.	-
dc.language	eng	-
dc.publisher	Elsevier	-
dc.relation.ispartof	Parallel Computing: Systems & Applications	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Distributed machine learning	-
dc.subject	Heterogeneous environment	-
dc.subject	Parameter server	-
dc.subject	Stragglers	-
dc.subject	Synchronous parallel	-
dc.title	WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel	-
dc.type	Article	-
dc.identifier.doi	10.1016/j.parco.2024.103092	-
dc.identifier.scopus	eid_2-s2.0-85198006976	-
dc.identifier.volume	121	-
dc.identifier.isi	WOS:001267107700001	-
dc.identifier.issnl	0167-8191	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats