Expediting Large-scale MoE Model Training in Heterogeneous Deep Learning Clusters


Grant Data
Project Title
Expediting Large-scale MoE Model Training in Heterogeneous Deep Learning Clusters
Principal Investigator
Professor Wu, Chuan   (Principal Investigator (PI))
Duration
36
Start Date
2024-01-01
Amount
1320766
Conference Title
Expediting Large-scale MoE Model Training in Heterogeneous Deep Learning Clusters
Keywords
Mixture of Experts (MoE) model, Parallel training, Tensor sharding, Comp-comm scheduling, Communication optimization
Discipline
NetworkOthers - Computing Science and Information Technology
Panel
Engineering (E)
HKU Project Code
17204423
Grant Type
General Research Fund (GRF) 2023/24
Funding Year
2023
Status
On-going
Objectives
Refer to ES