File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/INFOCOM42981.2021.9488678
- Scopus: eid_2-s2.0-85111944048
- WOS: WOS:000702210400015
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Near-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training
Title | Near-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training |
---|---|
Authors | |
Issue Date | 2021 |
Publisher | IEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000359 |
Citation | IEEE International Conference on Computer Communications (INFOCOM), Virtual Conference, Vancouver, BC, Canada, 10-13 May 2021, p. 1-10 How to Cite? |
Abstract | Distributed machine learning with multiple concurrent workers has been widely adopted to train large deep neural networks (DNNs). Parameter synchronization is a key component in each iteration of distributed training, where workers exchange locally computed gradients through an AllReduce operation or parameter servers, for global parameter updates. Parameter synchronization often constitutes a significant portion of the training time; minimizing the communication time contributes substantially to DNN training speed-up. Standard ring-based AllReduce or PS architecture work efficiently mostly with homogeneous inter-worker connectivity. However, available bandwidth among workers in real-world clusters is often heterogeneous, due to different hardware configurations, switching topologies, and contention with concurrent jobs. This work investigates the best parameter synchronization topology and schedule among workers for most expedited communication in distributed DNN training. We show that the optimal parameter synchronization topology should be comprised of trees with different workers as roots, each for aggregating or broadcasting a partition of gradients/parameters. We identify near-optimal forest packing to maximally utilize available bandwidth and overlap aggregation and broadcast stages to minimize communication time. We provide theoretical analysis of the performance bound, and show that our scheme outperforms state-of-the-art parameter synchronization schemes by up to 18.3 times with extensive evaluation under various settings. |
Persistent Identifier | http://hdl.handle.net/10722/301414 |
ISSN | 2023 SCImago Journal Rankings: 2.865 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Z | - |
dc.contributor.author | Wu, C | - |
dc.contributor.author | Li, Z | - |
dc.date.accessioned | 2021-07-27T08:10:43Z | - |
dc.date.available | 2021-07-27T08:10:43Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | IEEE International Conference on Computer Communications (INFOCOM), Virtual Conference, Vancouver, BC, Canada, 10-13 May 2021, p. 1-10 | - |
dc.identifier.issn | 0743-166X | - |
dc.identifier.uri | http://hdl.handle.net/10722/301414 | - |
dc.description.abstract | Distributed machine learning with multiple concurrent workers has been widely adopted to train large deep neural networks (DNNs). Parameter synchronization is a key component in each iteration of distributed training, where workers exchange locally computed gradients through an AllReduce operation or parameter servers, for global parameter updates. Parameter synchronization often constitutes a significant portion of the training time; minimizing the communication time contributes substantially to DNN training speed-up. Standard ring-based AllReduce or PS architecture work efficiently mostly with homogeneous inter-worker connectivity. However, available bandwidth among workers in real-world clusters is often heterogeneous, due to different hardware configurations, switching topologies, and contention with concurrent jobs. This work investigates the best parameter synchronization topology and schedule among workers for most expedited communication in distributed DNN training. We show that the optimal parameter synchronization topology should be comprised of trees with different workers as roots, each for aggregating or broadcasting a partition of gradients/parameters. We identify near-optimal forest packing to maximally utilize available bandwidth and overlap aggregation and broadcast stages to minimize communication time. We provide theoretical analysis of the performance bound, and show that our scheme outperforms state-of-the-art parameter synchronization schemes by up to 18.3 times with extensive evaluation under various settings. | - |
dc.language | eng | - |
dc.publisher | IEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000359 | - |
dc.relation.ispartof | IEEE INFOCOM - IEEE Conference on Computer Communications | - |
dc.rights | IEEE INFOCOM - IEEE Conference on Computer Communications. Copyright © IEEE Computer Society. | - |
dc.rights | ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | - |
dc.title | Near-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Wu, C: cwu@cs.hku.hk | - |
dc.identifier.authority | Wu, C=rp01397 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/INFOCOM42981.2021.9488678 | - |
dc.identifier.scopus | eid_2-s2.0-85111944048 | - |
dc.identifier.hkuros | 323509 | - |
dc.identifier.spage | 1 | - |
dc.identifier.epage | 10 | - |
dc.identifier.isi | WOS:000702210400015 | - |
dc.publisher.place | United States | - |