Optimizing distributed training deployment in heterogeneous GPU clusters

Yi, X; Zhang, S; Luo, Z; Long, G; Diao, L; Wu, C; Zheng, Z; Yang, J; Lin, W

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3386367.3432728
Scopus: eid_2-s2.0-85097614872

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Optimizing distributed training deployment in heterogeneous GPU clusters

Title	Optimizing distributed training deployment in heterogeneous GPU clusters
Authors	Yi, X Zhang, S Luo, Z Long, G Diao, L Wu, C Zheng, Z Yang, J Lin, W
Keywords	Distributed training heterogeneous environment deep learning
Issue Date	2020
Publisher	Association for Computing Machinery (ACM)
Citation	Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (CoNEXT '20), Barcelona, Spain, 2-4 December 2020, p. 93-107 How to Cite? DOI: http://dx.doi.org/10.1145/3386367.3432728
Abstract	This paper proposes HeteroG, an automatic module to accelerate deep neural network training in heterogeneous GPU clusters. To train a deep learning model with large amounts of data, distributed training using data or model parallelism has been widely adopted, mostly over homogeneous devices (GPUs, network bandwidth). Heterogeneous training environments may often exist in shared clusters with GPUs of different models purchased in different batches and network connections of different bandwidth availability (e.g., due to contention). Classic data parallelism does not work well in a heterogeneous cluster, while model-parallel training is hard to plan. HeteroG enables highly-efficient distributed training over heterogeneous devices, by automatically converting a single-GPU training model to a distributed one according to the deep learning graph and available resources. HeteroG embraces operation-level hybrid parallelism, communication architecture selection and execution scheduling, based on a carefully designed strategy framework exploiting both GNN-based learning and combinatorial optimization. We compare HeteroG with existing parallelism schemes and show that it achieves up-to 222% training speed-up. HeteroG also enables efficient training of large models over a set of heterogeneous devices where simple parallelism is infeasible.
Persistent Identifier	http://hdl.handle.net/10722/301293
ISBN	9781450379489

DC Field	Value	Language
dc.contributor.author	Yi, X	-
dc.contributor.author	Zhang, S	-
dc.contributor.author	Luo, Z	-
dc.contributor.author	Long, G	-
dc.contributor.author	Diao, L	-
dc.contributor.author	Wu, C	-
dc.contributor.author	Zheng, Z	-
dc.contributor.author	Yang, J	-
dc.contributor.author	Lin, W	-
dc.date.accessioned	2021-07-27T08:08:58Z	-
dc.date.available	2021-07-27T08:08:58Z	-
dc.date.issued	2020	-
dc.identifier.citation	Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (CoNEXT '20), Barcelona, Spain, 2-4 December 2020, p. 93-107	-
dc.identifier.isbn	9781450379489	-
dc.identifier.uri	http://hdl.handle.net/10722/301293	-
dc.description.abstract	This paper proposes HeteroG, an automatic module to accelerate deep neural network training in heterogeneous GPU clusters. To train a deep learning model with large amounts of data, distributed training using data or model parallelism has been widely adopted, mostly over homogeneous devices (GPUs, network bandwidth). Heterogeneous training environments may often exist in shared clusters with GPUs of different models purchased in different batches and network connections of different bandwidth availability (e.g., due to contention). Classic data parallelism does not work well in a heterogeneous cluster, while model-parallel training is hard to plan. HeteroG enables highly-efficient distributed training over heterogeneous devices, by automatically converting a single-GPU training model to a distributed one according to the deep learning graph and available resources. HeteroG embraces operation-level hybrid parallelism, communication architecture selection and execution scheduling, based on a carefully designed strategy framework exploiting both GNN-based learning and combinatorial optimization. We compare HeteroG with existing parallelism schemes and show that it achieves up-to 222% training speed-up. HeteroG also enables efficient training of large models over a set of heterogeneous devices where simple parallelism is infeasible.	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery (ACM)	-
dc.relation.ispartof	Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies	-
dc.subject	Distributed training	-
dc.subject	heterogeneous environment	-
dc.subject	deep learning	-
dc.title	Optimizing distributed training deployment in heterogeneous GPU clusters	-
dc.type	Conference_Paper	-
dc.identifier.email	Wu, C: cwu@cs.hku.hk	-
dc.identifier.authority	Wu, C=rp01397	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3386367.3432728	-
dc.identifier.scopus	eid_2-s2.0-85097614872	-
dc.identifier.hkuros	323512	-
dc.identifier.spage	93	-
dc.identifier.epage	107	-
dc.publisher.place	New York, NY	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Optimizing distributed training deployment in heterogeneous GPU clusters

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats