File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3386367.3432728
- Scopus: eid_2-s2.0-85097614872
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Optimizing distributed training deployment in heterogeneous GPU clusters
Title | Optimizing distributed training deployment in heterogeneous GPU clusters |
---|---|
Authors | |
Keywords | Distributed training heterogeneous environment deep learning |
Issue Date | 2020 |
Publisher | Association for Computing Machinery (ACM) |
Citation | Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (CoNEXT '20), Barcelona, Spain, 2-4 December 2020, p. 93-107 How to Cite? |
Abstract | This paper proposes HeteroG, an automatic module to accelerate deep neural network training in heterogeneous GPU clusters. To train a deep learning model with large amounts of data, distributed training using data or model parallelism has been widely adopted, mostly over homogeneous devices (GPUs, network bandwidth). Heterogeneous training environments may often exist in shared clusters with GPUs of different models purchased in different batches and network connections of different bandwidth availability (e.g., due to contention). Classic data parallelism does not work well in a heterogeneous cluster, while model-parallel training is hard to plan. HeteroG enables highly-efficient distributed training over heterogeneous devices, by automatically converting a single-GPU training model to a distributed one according to the deep learning graph and available resources. HeteroG embraces operation-level hybrid parallelism, communication architecture selection and execution scheduling, based on a carefully designed strategy framework exploiting both GNN-based learning and combinatorial optimization. We compare HeteroG with existing parallelism schemes and show that it achieves up-to 222% training speed-up. HeteroG also enables efficient training of large models over a set of heterogeneous devices where simple parallelism is infeasible. |
Persistent Identifier | http://hdl.handle.net/10722/301293 |
ISBN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yi, X | - |
dc.contributor.author | Zhang, S | - |
dc.contributor.author | Luo, Z | - |
dc.contributor.author | Long, G | - |
dc.contributor.author | Diao, L | - |
dc.contributor.author | Wu, C | - |
dc.contributor.author | Zheng, Z | - |
dc.contributor.author | Yang, J | - |
dc.contributor.author | Lin, W | - |
dc.date.accessioned | 2021-07-27T08:08:58Z | - |
dc.date.available | 2021-07-27T08:08:58Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (CoNEXT '20), Barcelona, Spain, 2-4 December 2020, p. 93-107 | - |
dc.identifier.isbn | 9781450379489 | - |
dc.identifier.uri | http://hdl.handle.net/10722/301293 | - |
dc.description.abstract | This paper proposes HeteroG, an automatic module to accelerate deep neural network training in heterogeneous GPU clusters. To train a deep learning model with large amounts of data, distributed training using data or model parallelism has been widely adopted, mostly over homogeneous devices (GPUs, network bandwidth). Heterogeneous training environments may often exist in shared clusters with GPUs of different models purchased in different batches and network connections of different bandwidth availability (e.g., due to contention). Classic data parallelism does not work well in a heterogeneous cluster, while model-parallel training is hard to plan. HeteroG enables highly-efficient distributed training over heterogeneous devices, by automatically converting a single-GPU training model to a distributed one according to the deep learning graph and available resources. HeteroG embraces operation-level hybrid parallelism, communication architecture selection and execution scheduling, based on a carefully designed strategy framework exploiting both GNN-based learning and combinatorial optimization. We compare HeteroG with existing parallelism schemes and show that it achieves up-to 222% training speed-up. HeteroG also enables efficient training of large models over a set of heterogeneous devices where simple parallelism is infeasible. | - |
dc.language | eng | - |
dc.publisher | Association for Computing Machinery (ACM) | - |
dc.relation.ispartof | Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies | - |
dc.subject | Distributed training | - |
dc.subject | heterogeneous environment | - |
dc.subject | deep learning | - |
dc.title | Optimizing distributed training deployment in heterogeneous GPU clusters | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Wu, C: cwu@cs.hku.hk | - |
dc.identifier.authority | Wu, C=rp01397 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/3386367.3432728 | - |
dc.identifier.scopus | eid_2-s2.0-85097614872 | - |
dc.identifier.hkuros | 323512 | - |
dc.identifier.spage | 93 | - |
dc.identifier.epage | 107 | - |
dc.publisher.place | New York, NY | - |