File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Democratizing large language model-based graph data augmentation via latent knowledge graphs

TitleDemocratizing large language model-based graph data augmentation via latent knowledge graphs
Authors
Issue Date11-Jul-2025
PublisherElsevier
Citation
Neural Networks, 2025, v. 191 How to Cite?
Abstract

Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which require access to the weights or latent features from the open-access LLMs, making them difficult to be democratized for everyone as the most advanced LLMs are often closed-source for commercial considerations. To overcome these limitations, we propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs — DemoGraph. Leveraging the text prompt as context-related information, we task the LLM with generating knowledge graphs (KGs), which allow us to capture the structural interactions from the text outputs. We then design a dynamic merging schema to stochastically integrate the LLM-generated KGs into the original graph during training. To control the sparsity of the augmented graph, we further devise a granularity-aware prompting strategy and an instruction fine-tuning module, which seamlessly generates text prompts according to different granularity levels of the dataset. Extensive experiments on various graph learning tasks validate the effectiveness of our method over existing graph data augmentation methods. Notably, our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge, leading to enhanced predictive performance and interpretability.


Persistent Identifierhttp://hdl.handle.net/10722/360866
ISSN
2023 Impact Factor: 6.0
2023 SCImago Journal Rankings: 2.605

 

DC FieldValueLanguage
dc.contributor.authorFeng, Yushi-
dc.contributor.authorChan, Tsai Hor-
dc.contributor.authorYin, Guosheng-
dc.contributor.authorYu, Lequan-
dc.date.accessioned2025-09-16T00:31:00Z-
dc.date.available2025-09-16T00:31:00Z-
dc.date.issued2025-07-11-
dc.identifier.citationNeural Networks, 2025, v. 191-
dc.identifier.issn0893-6080-
dc.identifier.urihttp://hdl.handle.net/10722/360866-
dc.description.abstract<p>Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which require access to the weights or latent features from the open-access LLMs, making them difficult to be democratized for everyone as the most advanced LLMs are often closed-source for commercial considerations. To overcome these limitations, we propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs — <strong>DemoGraph</strong>. Leveraging the text prompt as context-related information, we task the LLM with generating knowledge graphs (KGs), which allow us to capture the structural interactions from the text outputs. We then design a dynamic merging schema to stochastically integrate the LLM-generated KGs into the original graph during training. To control the sparsity of the augmented graph, we further devise a granularity-aware prompting strategy and an instruction fine-tuning module, which seamlessly generates text prompts according to different granularity levels of the dataset. Extensive experiments on various graph learning tasks validate the effectiveness of our method over existing graph data augmentation methods. Notably, our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge, leading to enhanced predictive performance and interpretability.<br></p>-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofNeural Networks-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleDemocratizing large language model-based graph data augmentation via latent knowledge graphs -
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1016/j.neunet.2025.107777-
dc.identifier.volume191-
dc.identifier.eissn1879-2782-
dc.identifier.issnl0893-6080-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats