File Download
Supplementary

postgraduate thesis: MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism

TitleMINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism
Authors
Advisors
Advisor(s):Cui, H
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhao, S. [赵世雄]. (2022). MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDeep learning has driven rapid progress in machine learning tasks such as computer vision, natural language processing, and computational biology. Meantime, computational resources, i.e., both computation time and memory space, needed for deep learning applications, i.e., one to many Deep Neural Networks, or DNNs, are fast increasing and far beyond the capability of a single device/host. This expedites distributed deep learning where deep learning applications are often deployed in a pipelining manner: a whole application is parted into a set of data processing elements (stages) connected in series; stages are placed on different devices/hosts; the output of one stage is the input of the next one. The decisive factors in the efficiency of pipeline systems for DNN training/serving (Deep Learning), especially under various runtime dynamicities, are manyfold, including the stage partition strategy, the pipeline scheduling algorithm, the load balancing in case of load changes, the working-set memory management of the excessive context used by all the tasks concurrently executed in a pipeline, and the recovery time and recovery correctness in case of failure. Unfortunately, our study found that despite much effort into building pipeline systems for deep learning, systems in the existing literature are sub-optimal in efficiency, focus only on one aspect of the aforementioned factors, and fail to continue the quality of service when dynamicities (e.g., load changing and failure) exist. This thesis showcases the design and implementation of MINDPIPE, a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism (data, tensor, pipeline, and supernet parallelism), with high resilience to various dynamicities including structural changes in DNN training, sparsely activated computation in DNN training, and host/device failure in DNN serving. The thesis first presents VPIPE, a pipeline parallel DNN training component of MINDPIPE that tackles a mismatch (imbalance) problem between pipeline stages’ computational load partitioning and memory load partitioning and achieved a high performance even with structural changes of the trained DNNs at runtime. The second part shows how MINDPIPE supports high-performance three dimensional (3D - data, tensor, and pipeline) parallel DNN training by using a pipeline scheduling that parallelizes the computational and communicational tasks in 3D training of large DNN models (e.g., Transformers) and optimizing the scheduling with a set of GPU memory offloading techniques. The third component NASPIPE extends the high performance and resilient pipeline parallel training support to the supernet training dimension (the 4th dimension), which is frequently adopted in the field of Neural Architecture Search, where sub-DNNs are sparsely activated from a super-DNN. Last but not least, HAMS presents a DNN serving component that serves the above trained DNNs in a serving pipeline (graph) and provides high availability (resilience to host failure) with negligible normal-case latency penalties.
DegreeDoctor of Philosophy
SubjectDeep learning (Machine learning)
Neural networks (Computer science)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/328587

 

DC FieldValueLanguage
dc.contributor.advisorCui, H-
dc.contributor.authorZhao, Shixiong-
dc.contributor.author赵世雄-
dc.date.accessioned2023-06-29T05:44:28Z-
dc.date.available2023-06-29T05:44:28Z-
dc.date.issued2022-
dc.identifier.citationZhao, S. [赵世雄]. (2022). MINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/328587-
dc.description.abstractDeep learning has driven rapid progress in machine learning tasks such as computer vision, natural language processing, and computational biology. Meantime, computational resources, i.e., both computation time and memory space, needed for deep learning applications, i.e., one to many Deep Neural Networks, or DNNs, are fast increasing and far beyond the capability of a single device/host. This expedites distributed deep learning where deep learning applications are often deployed in a pipelining manner: a whole application is parted into a set of data processing elements (stages) connected in series; stages are placed on different devices/hosts; the output of one stage is the input of the next one. The decisive factors in the efficiency of pipeline systems for DNN training/serving (Deep Learning), especially under various runtime dynamicities, are manyfold, including the stage partition strategy, the pipeline scheduling algorithm, the load balancing in case of load changes, the working-set memory management of the excessive context used by all the tasks concurrently executed in a pipeline, and the recovery time and recovery correctness in case of failure. Unfortunately, our study found that despite much effort into building pipeline systems for deep learning, systems in the existing literature are sub-optimal in efficiency, focus only on one aspect of the aforementioned factors, and fail to continue the quality of service when dynamicities (e.g., load changing and failure) exist. This thesis showcases the design and implementation of MINDPIPE, a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism (data, tensor, pipeline, and supernet parallelism), with high resilience to various dynamicities including structural changes in DNN training, sparsely activated computation in DNN training, and host/device failure in DNN serving. The thesis first presents VPIPE, a pipeline parallel DNN training component of MINDPIPE that tackles a mismatch (imbalance) problem between pipeline stages’ computational load partitioning and memory load partitioning and achieved a high performance even with structural changes of the trained DNNs at runtime. The second part shows how MINDPIPE supports high-performance three dimensional (3D - data, tensor, and pipeline) parallel DNN training by using a pipeline scheduling that parallelizes the computational and communicational tasks in 3D training of large DNN models (e.g., Transformers) and optimizing the scheduling with a set of GPU memory offloading techniques. The third component NASPIPE extends the high performance and resilient pipeline parallel training support to the supernet training dimension (the 4th dimension), which is frequently adopted in the field of Neural Architecture Search, where sub-DNNs are sparsely activated from a super-DNN. Last but not least, HAMS presents a DNN serving component that serves the above trained DNNs in a serving pipeline (graph) and provides high availability (resilience to host failure) with negligible normal-case latency penalties.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDeep learning (Machine learning)-
dc.subject.lcshNeural networks (Computer science)-
dc.titleMINDPIPE : a high-performance and carbon-efficient training system for general large AI models with four dimensional parallelism-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044695782503414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats