MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism

Zhao, Shixiong; 赵世雄

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism

Title	MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism
Authors	Zhao, Shixiong 赵世雄
Advisors	Advisor(s):Cui, H
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhao, S. [赵世雄]. (2022). MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Deep learning has driven rapid progress in machine learning tasks such as computer vision, natural language processing, and computational biology. Meantime, computational resources, i.e., both computation time and memory space, needed for deep learning applications, i.e., one to many Deep Neural Networks, or DNNs, are fast increasing and far beyond the capability of a single device/host. This expedites distributed deep learning where deep learning applications are often deployed in a pipelining manner: a whole application is parted into a set of data processing elements (stages) connected in series; stages are placed on different devices/hosts; the output of one stage is the input of the next one. The decisive factors in the efficiency of pipeline systems for DNN training/serving (Deep Learning), especially under various runtime dynamicities, are manyfold, including the stage partition strategy, the pipeline scheduling algorithm, the load balancing in case of load changes, the working-set memory management of the excessive context used by all the tasks concurrently executed in a pipeline, and the recovery time and recovery correctness in case of failure. Unfortunately, our study found that despite much effort into building pipeline systems for deep learning, systems in the existing literature are sub-optimal in efficiency, focus only on one aspect of the aforementioned factors, and fail to continue the quality of service when dynamicities (e.g., load changing and failure) exist. This thesis showcases the design and implementation of MINDPIPE, a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism (data, tensor, pipeline, and supernet parallelism), with high resilience to various dynamicities including structural changes in DNN training, sparsely activated computation in DNN training, and host/device failure in DNN serving. The thesis first presents VPIPE, a pipeline parallel DNN training component of MINDPIPE that tackles a mismatch (imbalance) problem between pipeline stages’ computational load partitioning and memory load partitioning and achieved a high performance even with structural changes of the trained DNNs at runtime. The second part shows how MINDPIPE supports high-performance three dimensional (3D - data, tensor, and pipeline) parallel DNN training by using a pipeline scheduling that parallelizes the computational and communicational tasks in 3D training of large DNN models (e.g., Transformers) and optimizing the scheduling with a set of GPU memory offloading techniques. The third component NASPIPE extends the high performance and resilient pipeline parallel training support to the supernet training dimension (the 4th dimension), which is frequently adopted in the field of Neural Architecture Search, where sub-DNNs are sparsely activated from a super-DNN. Last but not least, HAMS presents a DNN serving component that serves the above trained DNNs in a serving pipeline (graph) and provides high availability (resilience to host failure) with negligible normal-case latency penalties.
Degree	Doctor of Philosophy
Subject	Deep learning (Machine learning) Neural networks (Computer science)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/328587

DC Field	Value	Language
dc.contributor.advisor	Cui, H	-
dc.contributor.author	Zhao, Shixiong	-
dc.contributor.author	赵世雄	-
dc.date.accessioned	2023-06-29T05:44:28Z	-
dc.date.available	2023-06-29T05:44:28Z	-
dc.date.issued	2022	-
dc.identifier.citation	Zhao, S. [赵世雄]. (2022). MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/328587	-
dc.description.abstract	Deep learning has driven rapid progress in machine learning tasks such as computer vision, natural language processing, and computational biology. Meantime, computational resources, i.e., both computation time and memory space, needed for deep learning applications, i.e., one to many Deep Neural Networks, or DNNs, are fast increasing and far beyond the capability of a single device/host. This expedites distributed deep learning where deep learning applications are often deployed in a pipelining manner: a whole application is parted into a set of data processing elements (stages) connected in series; stages are placed on different devices/hosts; the output of one stage is the input of the next one. The decisive factors in the efficiency of pipeline systems for DNN training/serving (Deep Learning), especially under various runtime dynamicities, are manyfold, including the stage partition strategy, the pipeline scheduling algorithm, the load balancing in case of load changes, the working-set memory management of the excessive context used by all the tasks concurrently executed in a pipeline, and the recovery time and recovery correctness in case of failure. Unfortunately, our study found that despite much effort into building pipeline systems for deep learning, systems in the existing literature are sub-optimal in efficiency, focus only on one aspect of the aforementioned factors, and fail to continue the quality of service when dynamicities (e.g., load changing and failure) exist. This thesis showcases the design and implementation of MINDPIPE, a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism (data, tensor, pipeline, and supernet parallelism), with high resilience to various dynamicities including structural changes in DNN training, sparsely activated computation in DNN training, and host/device failure in DNN serving. The thesis first presents VPIPE, a pipeline parallel DNN training component of MINDPIPE that tackles a mismatch (imbalance) problem between pipeline stages’ computational load partitioning and memory load partitioning and achieved a high performance even with structural changes of the trained DNNs at runtime. The second part shows how MINDPIPE supports high-performance three dimensional (3D - data, tensor, and pipeline) parallel DNN training by using a pipeline scheduling that parallelizes the computational and communicational tasks in 3D training of large DNN models (e.g., Transformers) and optimizing the scheduling with a set of GPU memory offloading techniques. The third component NASPIPE extends the high performance and resilient pipeline parallel training support to the supernet training dimension (the 4th dimension), which is frequently adopted in the field of Neural Architecture Search, where sub-DNNs are sparsely activated from a super-DNN. Last but not least, HAMS presents a DNN serving component that serves the above trained DNNs in a serving pipeline (graph) and provides high availability (resilience to host failure) with negligible normal-case latency penalties.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Deep learning (Machine learning)	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.title	MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044695782503414	-

File Download

Supplementary

postgraduate thesis: MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats