File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Near-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training

TitleNear-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training
Authors
Issue Date2021
PublisherIEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000359
Citation
IEEE International Conference on Computer Communications (INFOCOM), Virtual Conference, Vancouver, BC, Canada, 10-13 May 2021, p. 1-10 How to Cite?
AbstractDistributed machine learning with multiple concurrent workers has been widely adopted to train large deep neural networks (DNNs). Parameter synchronization is a key component in each iteration of distributed training, where workers exchange locally computed gradients through an AllReduce operation or parameter servers, for global parameter updates. Parameter synchronization often constitutes a significant portion of the training time; minimizing the communication time contributes substantially to DNN training speed-up. Standard ring-based AllReduce or PS architecture work efficiently mostly with homogeneous inter-worker connectivity. However, available bandwidth among workers in real-world clusters is often heterogeneous, due to different hardware configurations, switching topologies, and contention with concurrent jobs. This work investigates the best parameter synchronization topology and schedule among workers for most expedited communication in distributed DNN training. We show that the optimal parameter synchronization topology should be comprised of trees with different workers as roots, each for aggregating or broadcasting a partition of gradients/parameters. We identify near-optimal forest packing to maximally utilize available bandwidth and overlap aggregation and broadcast stages to minimize communication time. We provide theoretical analysis of the performance bound, and show that our scheme outperforms state-of-the-art parameter synchronization schemes by up to 18.3 times with extensive evaluation under various settings.
Persistent Identifierhttp://hdl.handle.net/10722/301414
ISSN
2020 SCImago Journal Rankings: 1.183
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorZhang, Z-
dc.contributor.authorWu, C-
dc.contributor.authorLi, Z-
dc.date.accessioned2021-07-27T08:10:43Z-
dc.date.available2021-07-27T08:10:43Z-
dc.date.issued2021-
dc.identifier.citationIEEE International Conference on Computer Communications (INFOCOM), Virtual Conference, Vancouver, BC, Canada, 10-13 May 2021, p. 1-10-
dc.identifier.issn0743-166X-
dc.identifier.urihttp://hdl.handle.net/10722/301414-
dc.description.abstractDistributed machine learning with multiple concurrent workers has been widely adopted to train large deep neural networks (DNNs). Parameter synchronization is a key component in each iteration of distributed training, where workers exchange locally computed gradients through an AllReduce operation or parameter servers, for global parameter updates. Parameter synchronization often constitutes a significant portion of the training time; minimizing the communication time contributes substantially to DNN training speed-up. Standard ring-based AllReduce or PS architecture work efficiently mostly with homogeneous inter-worker connectivity. However, available bandwidth among workers in real-world clusters is often heterogeneous, due to different hardware configurations, switching topologies, and contention with concurrent jobs. This work investigates the best parameter synchronization topology and schedule among workers for most expedited communication in distributed DNN training. We show that the optimal parameter synchronization topology should be comprised of trees with different workers as roots, each for aggregating or broadcasting a partition of gradients/parameters. We identify near-optimal forest packing to maximally utilize available bandwidth and overlap aggregation and broadcast stages to minimize communication time. We provide theoretical analysis of the performance bound, and show that our scheme outperforms state-of-the-art parameter synchronization schemes by up to 18.3 times with extensive evaluation under various settings.-
dc.languageeng-
dc.publisherIEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000359-
dc.relation.ispartofIEEE INFOCOM - IEEE Conference on Computer Communications-
dc.rightsIEEE INFOCOM - IEEE Conference on Computer Communications. Copyright © IEEE Computer Society.-
dc.rights©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.-
dc.titleNear-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training-
dc.typeConference_Paper-
dc.identifier.emailWu, C: cwu@cs.hku.hk-
dc.identifier.authorityWu, C=rp01397-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/INFOCOM42981.2021.9488678-
dc.identifier.scopuseid_2-s2.0-85111944048-
dc.identifier.hkuros323509-
dc.identifier.spage1-
dc.identifier.epage10-
dc.identifier.isiWOS:000702210400015-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats