Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism

Zhang, S; Diao, L; Wu, C; Wang, S; Lin, W

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3542929.3563487

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism

Title	Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism
Authors	Zhang, S Diao, L Wu, C Wang, S Lin, W
Keywords	Distributed system Neural networks Pipeline parallelism
Issue Date	2022
Publisher	Association for Computing Machinery.
Citation	The 13th ACM Symposium on Cloud Computing (SOCC’22), San Francisco, CA, United States, November 8-10, 2022. In SoCC '22: Proceedings of the 13th Symposium on Cloud Computing, p. 403-418 How to Cite? DOI: http://dx.doi.org/10.1145/3542929.3563487
Abstract	Deep neural networks (DNNs) with trillions of parameters have emerged, e.g., Mixture-of-Experts (MoE) models. Training models of this scale requires sophisticated parallelization strategies like the newly proposed SPMD parallelism, that shards each tensor along different dimensions. A common problem using SPMD is that computation stalls during communication due to data dependencies, resulting in low GPU utilization and long training time. We present a general technique to accelerate SPMD-based DNN training by maximizing computation-communication overlap and automatic SPMD strategy search. The key idea is to duplicate the DNN model into two copies that have no dependency, and interleave their execution such that computation of one copy overlaps with communication of the other. We propose a dynamic programming algorithm to automatically identify optimized sharding strategies that minimize model training time by maximally enabling computation-communication overlap. Experiments show that our designs achieve up to 61% training speed-up as compared to existing frameworks
Persistent Identifier	http://hdl.handle.net/10722/320624

DC Field	Value	Language
dc.contributor.author	Zhang, S	-
dc.contributor.author	Diao, L	-
dc.contributor.author	Wu, C	-
dc.contributor.author	Wang, S	-
dc.contributor.author	Lin, W	-
dc.date.accessioned	2022-10-21T07:56:50Z	-
dc.date.available	2022-10-21T07:56:50Z	-
dc.date.issued	2022	-
dc.identifier.citation	The 13th ACM Symposium on Cloud Computing (SOCC’22), San Francisco, CA, United States, November 8-10, 2022. In SoCC '22: Proceedings of the 13th Symposium on Cloud Computing, p. 403-418	-
dc.identifier.uri	http://hdl.handle.net/10722/320624	-
dc.description.abstract	Deep neural networks (DNNs) with trillions of parameters have emerged, e.g., Mixture-of-Experts (MoE) models. Training models of this scale requires sophisticated parallelization strategies like the newly proposed SPMD parallelism, that shards each tensor along different dimensions. A common problem using SPMD is that computation stalls during communication due to data dependencies, resulting in low GPU utilization and long training time. We present a general technique to accelerate SPMD-based DNN training by maximizing computation-communication overlap and automatic SPMD strategy search. The key idea is to duplicate the DNN model into two copies that have no dependency, and interleave their execution such that computation of one copy overlaps with communication of the other. We propose a dynamic programming algorithm to automatically identify optimized sharding strategies that minimize model training time by maximally enabling computation-communication overlap. Experiments show that our designs achieve up to 61% training speed-up as compared to existing frameworks	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery.	-
dc.relation.ispartof	SoCC '22: Proceedings of the 13th Symposium on Cloud Computing	-
dc.rights	SoCC '22: Proceedings of the 13th Symposium on Cloud Computing. Copyright © Association for Computing Machinery.	-
dc.subject	Distributed system	-
dc.subject	Neural networks	-
dc.subject	Pipeline parallelism	-
dc.title	Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism	-
dc.type	Conference_Paper	-
dc.identifier.email	Wu, C: cwu@cs.hku.hk	-
dc.identifier.authority	Wu, C=rp01397	-
dc.identifier.doi	10.1145/3542929.3563487	-
dc.identifier.hkuros	340525	-
dc.identifier.spage	403	-
dc.identifier.epage	418	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats