Optimizing Distributed GNN Training on Large Graphs


Grant Data
Project Title
Optimizing Distributed GNN Training on Large Graphs
Principal Investigator
Professor Wu, Chuan   (Principal Investigator (PI))
Duration
36
Start Date
2021-09-01
Amount
1093580
Conference Title
Optimizing Distributed GNN Training on Large Graphs
Presentation Title
Keywords
Distributed ML System, Flow Scheduling, GNN Training, Graph Sampling, Placement
Discipline
Network,Others - Computing Science and Information Technology
Panel
Engineering (E)
HKU Project Code
17207621
Grant Type
General Research Fund (GRF)
Funding Year
2021
Status
On-going
Objectives
1 [Algorithms for Communication and Computation Scheduling in Distributed GNN Training]: Design efficient and near-optimal graph sampling/GNN gradient communication and graph store/sampler/trainer execution scheduling algorithms, given graph store/sampler/trainer placement. 2 [Algorithms for Graph Store, Sampler and Trainer Placement]: Design efficient and near-optimal placement strategies for graph stores, samplers and trainers, jointly achieving GNN training time minimization with communication/computation scheduling. 3 [Joint Graph Partition, Sampling and Caching Design]: Design joint, efficient approaches for graph partition, sampling and graph data caching on training machines, to further reduce inter-machine traffic and expedite GNN training convergence. 4 [Implementation and Evaluation]: Implement a distributed GNN training system using our algorithms and strategies, and evaluate it with real-world GNN training workloads in AI clouds.