Two-level Graph Caching for Expediting Distributed GNN Training

Zhang, Zhe; Luo, Ziyue; Wu, Chuan

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/INFOCOM53939.2023.10228911

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Two-level Graph Caching for Expediting Distributed GNN Training

Title	Two-level Graph Caching for Expediting Distributed GNN Training
Authors	Zhang, Zhe Luo, Ziyue Wu, Chuan
Issue Date	17-May-2023
Abstract	Graph Neural Networks (GNNs) are increasingly popular due to excellent performance on learning graph-structured data in various domains. With fast expanding graph sizes and feature dimensions, distributed GNN training has been adopted, with multiple concurrent workers learning on different portions of a large graph. It has been observed that a main bottleneck in distributed GNN training lies in graph feature fetching across servers, which dominates the training time of each training iteration at each worker. This paper studies efficient feature caching on each worker to minimize feature fetching overhead, in order to expedite distributed GNN training. Current distributed GNN training systems largely adopt static caching of fixed neighbor nodes. We propose a novel two-level dynamic cache design exploiting both GPU memory and host memory at each worker, and design efficient two-level dynamic caching algorithms based on online optimization and a lookahead batching mechanism. Our dynamic caching algorithms consider node requesting probabilities and heterogeneous feature fetching costs from different servers, achieving an O(log ³ k) competitive ratio in terms of overall feature-fetching communication cost (where k is the cache capacity). We evaluate practical performance of our caching design with testbed experiments, and show that our design achieves up to 5.4x convergence speed-up.
Persistent Identifier	http://hdl.handle.net/10722/333890

DC Field	Value	Language
dc.contributor.author	Zhang, Zhe	-
dc.contributor.author	Luo, Ziyue	-
dc.contributor.author	Wu, Chuan	-
dc.date.accessioned	2023-10-06T08:39:56Z	-
dc.date.available	2023-10-06T08:39:56Z	-
dc.date.issued	2023-05-17	-
dc.identifier.uri	http://hdl.handle.net/10722/333890	-
dc.description.abstract	<p>Graph Neural Networks (GNNs) are increasingly popular due to excellent performance on learning graph-structured data in various domains. With fast expanding graph sizes and feature dimensions, distributed GNN training has been adopted, with multiple concurrent workers learning on different portions of a large graph. It has been observed that a main bottleneck in distributed GNN training lies in graph feature fetching across servers, which dominates the training time of each training iteration at each worker. This paper studies efficient feature caching on each worker to minimize feature fetching overhead, in order to expedite distributed GNN training. Current distributed GNN training systems largely adopt static caching of fixed neighbor nodes. We propose a novel two-level dynamic cache design exploiting both GPU memory and host memory at each worker, and design efficient two-level dynamic caching algorithms based on online optimization and a lookahead batching mechanism. Our dynamic caching algorithms consider node requesting probabilities and heterogeneous feature fetching costs from different servers, achieving an O(log <sup>3</sup> k) competitive ratio in terms of overall feature-fetching communication cost (where k is the cache capacity). We evaluate practical performance of our caching design with testbed experiments, and show that our design achieves up to 5.4x convergence speed-up.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	IEEE International Conference on Computer Communications (INFOCOM) 2023 (17/05/2023-20/05/2023, New York)	-
dc.title	Two-level Graph Caching for Expediting Distributed GNN Training	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.1109/INFOCOM53939.2023.10228911	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Two-level Graph Caching for Expediting Distributed GNN Training

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats