File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

TitleDAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Authors
Keywordsdeep learning
data parallelism
pipeline parallelism
hybrid parallelism
Issue Date2021
PublisherAssociation for Computing Machinery (ACM)
Citation
Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '21), Virtual Conference, Republic of Korea, 27 February 2021, p. 431-445 How to Cite?
AbstractIt is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23× speedup under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6× speedup of training throughput and saves 12% of memory consumption at the same time.
Persistent Identifierhttp://hdl.handle.net/10722/301415
ISBN

 

DC FieldValueLanguage
dc.contributor.authorFan, S-
dc.contributor.authorRong, Y-
dc.contributor.authorMeng, C-
dc.contributor.authorCao, Z-
dc.contributor.authorWang, S-
dc.contributor.authorZheng, Z-
dc.contributor.authorWu, C-
dc.contributor.authorLong, G-
dc.contributor.authorYang, J-
dc.contributor.authorXia, L-
dc.contributor.authorDiao, L-
dc.contributor.authorLiu, X-
dc.contributor.authorLin, W-
dc.date.accessioned2021-07-27T08:10:44Z-
dc.date.available2021-07-27T08:10:44Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '21), Virtual Conference, Republic of Korea, 27 February 2021, p. 431-445-
dc.identifier.isbn9781450382946-
dc.identifier.urihttp://hdl.handle.net/10722/301415-
dc.description.abstractIt is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23× speedup under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6× speedup of training throughput and saves 12% of memory consumption at the same time.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery (ACM)-
dc.relation.ispartofProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming-
dc.subjectdeep learning-
dc.subjectdata parallelism-
dc.subjectpipeline parallelism-
dc.subjecthybrid parallelism-
dc.titleDAPPLE: A Pipelined Data Parallel Approach for Training Large Models-
dc.typeConference_Paper-
dc.identifier.emailWu, C: cwu@cs.hku.hk-
dc.identifier.authorityWu, C=rp01397-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/3437801.3441593-
dc.identifier.scopuseid_2-s2.0-85101713868-
dc.identifier.hkuros323510-
dc.identifier.spage431-
dc.identifier.epage445-
dc.publisher.placeNew York, NY-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats