HKU Scholars Hub: Project

Expediting Large-scale MoE Model Training in Heterogeneous Deep Learning Clusters

View Statistics Email Alert RSS Feed

Grant Data

Project Title

Expediting Large-scale MoE Model Training in Heterogeneous Deep Learning Clusters

Principal Investigator

Professor Wu, Chuan (Principal Investigator (PI))

Duration

Start Date

2024-01-01

Amount

1320766

Conference Title

Expediting Large-scale MoE Model Training in Heterogeneous Deep Learning Clusters

Keywords

Mixture of Experts (MoE) model, Parallel training, Tensor sharding, Comp-comm scheduling, Communication optimization

Discipline

NetworkOthers - Computing Science and Information Technology

Panel

Engineering (E)

HKU Project Code

17204423

Grant Type

General Research Fund (GRF) 2023/24

Funding Year

2023

Status

On-going

Objectives

Refer to ES