File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism
Title | MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism |
---|---|
Authors | |
Advisors | Advisor(s):Cui, H |
Issue Date | 2022 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhao, S. [赵世雄]. (2022). MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Deep learning has driven rapid progress in machine learning tasks such as computer vision, natural language processing, and computational biology. Meantime, computational resources, i.e., both computation time and memory space, needed for deep learning applications, i.e., one to many Deep Neural Networks, or DNNs, are fast increasing and far beyond the capability of a single device/host. This expedites distributed deep learning where deep learning applications are often deployed in a pipelining manner: a whole application is parted into a set of data processing elements (stages) connected in series; stages are placed on different devices/hosts;
the output of one stage is the input of the next one.
The decisive factors in the efficiency of pipeline systems for DNN training/serving (Deep Learning), especially under various runtime dynamicities, are manyfold, including the stage partition strategy, the pipeline scheduling algorithm, the load balancing in case of load changes, the working-set memory management of the excessive context used by all the tasks concurrently executed in a pipeline, and the recovery time and recovery correctness in case of failure. Unfortunately, our study found that despite much effort into building pipeline systems for deep learning, systems in the existing literature are sub-optimal in efficiency, focus only on one aspect of the aforementioned factors, and fail to continue the quality of service when dynamicities (e.g., load changing and failure) exist.
This thesis showcases the design and implementation of MINDPIPE, a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism (data, tensor, pipeline, and supernet parallelism), with high resilience to various dynamicities including structural changes in DNN training, sparsely activated computation in DNN training, and host/device failure in DNN serving. The thesis first presents VPIPE, a pipeline parallel DNN training component of MINDPIPE that tackles a mismatch (imbalance) problem between pipeline stages’ computational load partitioning and memory load partitioning and achieved a high
performance even with structural changes of the trained DNNs at runtime. The second part shows how MINDPIPE supports high-performance three dimensional (3D - data, tensor, and pipeline) parallel DNN training by using a pipeline scheduling that parallelizes the computational and communicational tasks in 3D training of large DNN models (e.g., Transformers) and optimizing the scheduling with a set of GPU memory offloading techniques. The third component NASPIPE extends the high performance and resilient pipeline parallel training support to the supernet training dimension (the 4th dimension), which is frequently adopted in the field of Neural
Architecture Search, where sub-DNNs are sparsely activated from a super-DNN. Last but not least, HAMS presents a DNN serving component that serves the above trained DNNs in a serving pipeline (graph) and provides high availability (resilience to host failure) with negligible normal-case latency penalties. |
Degree | Doctor of Philosophy |
Subject | Deep learning (Machine learning) Neural networks (Computer science) |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/328587 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Cui, H | - |
dc.contributor.author | Zhao, Shixiong | - |
dc.contributor.author | 赵世雄 | - |
dc.date.accessioned | 2023-06-29T05:44:28Z | - |
dc.date.available | 2023-06-29T05:44:28Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Zhao, S. [赵世雄]. (2022). MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/328587 | - |
dc.description.abstract | Deep learning has driven rapid progress in machine learning tasks such as computer vision, natural language processing, and computational biology. Meantime, computational resources, i.e., both computation time and memory space, needed for deep learning applications, i.e., one to many Deep Neural Networks, or DNNs, are fast increasing and far beyond the capability of a single device/host. This expedites distributed deep learning where deep learning applications are often deployed in a pipelining manner: a whole application is parted into a set of data processing elements (stages) connected in series; stages are placed on different devices/hosts; the output of one stage is the input of the next one. The decisive factors in the efficiency of pipeline systems for DNN training/serving (Deep Learning), especially under various runtime dynamicities, are manyfold, including the stage partition strategy, the pipeline scheduling algorithm, the load balancing in case of load changes, the working-set memory management of the excessive context used by all the tasks concurrently executed in a pipeline, and the recovery time and recovery correctness in case of failure. Unfortunately, our study found that despite much effort into building pipeline systems for deep learning, systems in the existing literature are sub-optimal in efficiency, focus only on one aspect of the aforementioned factors, and fail to continue the quality of service when dynamicities (e.g., load changing and failure) exist. This thesis showcases the design and implementation of MINDPIPE, a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism (data, tensor, pipeline, and supernet parallelism), with high resilience to various dynamicities including structural changes in DNN training, sparsely activated computation in DNN training, and host/device failure in DNN serving. The thesis first presents VPIPE, a pipeline parallel DNN training component of MINDPIPE that tackles a mismatch (imbalance) problem between pipeline stages’ computational load partitioning and memory load partitioning and achieved a high performance even with structural changes of the trained DNNs at runtime. The second part shows how MINDPIPE supports high-performance three dimensional (3D - data, tensor, and pipeline) parallel DNN training by using a pipeline scheduling that parallelizes the computational and communicational tasks in 3D training of large DNN models (e.g., Transformers) and optimizing the scheduling with a set of GPU memory offloading techniques. The third component NASPIPE extends the high performance and resilient pipeline parallel training support to the supernet training dimension (the 4th dimension), which is frequently adopted in the field of Neural Architecture Search, where sub-DNNs are sparsely activated from a super-DNN. Last but not least, HAMS presents a DNN serving component that serves the above trained DNNs in a serving pipeline (graph) and provides high availability (resilience to host failure) with negligible normal-case latency penalties. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.subject.lcsh | Neural networks (Computer science) | - |
dc.title | MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044695782503414 | - |