File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

TitleAdaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Authors
Issue Date6-Jun-2023
Abstract

Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T−1) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full- graph training’s throughput (up to 3.01×) with negligible accuracy drop (at most 0.30%) or even accuracy improvement (up to 0.19%) in most cases, showing significant advantages over the state-of-the-art works. The code is available at https://github.com/raywan-110/AdaQP.


Persistent Identifierhttp://hdl.handle.net/10722/333888

 

DC FieldValueLanguage
dc.contributor.authorWan, Borui-
dc.contributor.authorZhao, Juntao-
dc.contributor.authorWu, Chuan-
dc.date.accessioned2023-10-06T08:39:55Z-
dc.date.available2023-10-06T08:39:55Z-
dc.date.issued2023-06-06-
dc.identifier.urihttp://hdl.handle.net/10722/333888-
dc.description.abstract<p>Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. Frequent exchanges of node features, embeddings and embedding gradients (all referred to as messages) across devices bring significant communication overhead for nodes with remote neighbors on other devices (marginal nodes) and unnecessary waiting time for nodes without remote neighbors (central nodes) in the graph. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph GNN training. We stochastically quantize messages transferred across devices to lower-precision integers for communication traffic reduction and advocate communication-computation parallelization between marginal nodes and central nodes. We provide theoretical analysis to prove fast training convergence (at the rate of O(T−1) with T being the total number of training epochs) and design an adaptive quantization bit-width assignment scheme for each message based on the analysis, targeting a good trade-off between training convergence and efficiency. Extensive experiments on mainstream graph datasets show that AdaQP substantially improves distributed full- graph training’s throughput (up to 3.01×) with negligible accuracy drop (at most 0.30%) or even accuracy improvement (up to 0.19%) in most cases, showing significant advantages over the state-of-the-art works. The code is available at https://github.com/raywan-110/AdaQP.</p>-
dc.languageeng-
dc.relation.ispartofthe Sixth Conference on Machine Learning and Systems (MLSys) (04/06/2023-08/06/2023, Miami)-
dc.titleAdaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training-
dc.typeConference_Paper-
dc.identifier.doi10.48550/arXiv.2306.01381-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats