File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Two-level Graph Caching for Expediting Distributed GNN Training

TitleTwo-level Graph Caching for Expediting Distributed GNN Training
Authors
Issue Date17-May-2023
Abstract

Graph Neural Networks (GNNs) are increasingly popular due to excellent performance on learning graph-structured data in various domains. With fast expanding graph sizes and feature dimensions, distributed GNN training has been adopted, with multiple concurrent workers learning on different portions of a large graph. It has been observed that a main bottleneck in distributed GNN training lies in graph feature fetching across servers, which dominates the training time of each training iteration at each worker. This paper studies efficient feature caching on each worker to minimize feature fetching overhead, in order to expedite distributed GNN training. Current distributed GNN training systems largely adopt static caching of fixed neighbor nodes. We propose a novel two-level dynamic cache design exploiting both GPU memory and host memory at each worker, and design efficient two-level dynamic caching algorithms based on online optimization and a lookahead batching mechanism. Our dynamic caching algorithms consider node requesting probabilities and heterogeneous feature fetching costs from different servers, achieving an O(log 3 k) competitive ratio in terms of overall feature-fetching communication cost (where k is the cache capacity). We evaluate practical performance of our caching design with testbed experiments, and show that our design achieves up to 5.4x convergence speed-up.


Persistent Identifierhttp://hdl.handle.net/10722/333890

 

DC FieldValueLanguage
dc.contributor.authorZhang, Zhe-
dc.contributor.authorLuo, Ziyue-
dc.contributor.authorWu, Chuan-
dc.date.accessioned2023-10-06T08:39:56Z-
dc.date.available2023-10-06T08:39:56Z-
dc.date.issued2023-05-17-
dc.identifier.urihttp://hdl.handle.net/10722/333890-
dc.description.abstract<p>Graph Neural Networks (GNNs) are increasingly popular due to excellent performance on learning graph-structured data in various domains. With fast expanding graph sizes and feature dimensions, distributed GNN training has been adopted, with multiple concurrent workers learning on different portions of a large graph. It has been observed that a main bottleneck in distributed GNN training lies in graph feature fetching across servers, which dominates the training time of each training iteration at each worker. This paper studies efficient feature caching on each worker to minimize feature fetching overhead, in order to expedite distributed GNN training. Current distributed GNN training systems largely adopt static caching of fixed neighbor nodes. We propose a novel two-level dynamic cache design exploiting both GPU memory and host memory at each worker, and design efficient two-level dynamic caching algorithms based on online optimization and a lookahead batching mechanism. Our dynamic caching algorithms consider node requesting probabilities and heterogeneous feature fetching costs from different servers, achieving an O(log <sup>3</sup> k) competitive ratio in terms of overall feature-fetching communication cost (where k is the cache capacity). We evaluate practical performance of our caching design with testbed experiments, and show that our design achieves up to 5.4x convergence speed-up.<br></p>-
dc.languageeng-
dc.relation.ispartofIEEE International Conference on Computer Communications (INFOCOM) 2023 (17/05/2023-20/05/2023, New York)-
dc.titleTwo-level Graph Caching for Expediting Distributed GNN Training-
dc.typeConference_Paper-
dc.identifier.doi10.1109/INFOCOM53939.2023.10228911-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats