Democratizing large language model-based graph data augmentation via latent knowledge graphs

Feng, Yushi; Chan, Tsai Hor; Yin, Guosheng; Yu, Lequan

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.neunet.2025.107777
Find via

Supplementary

Citations:
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Democratizing large language model-based graph data augmentation via latent knowledge graphs

Title	Democratizing large language model-based graph data augmentation via latent knowledge graphs
Authors	Feng, Yushi Chan, Tsai Hor Yin, Guosheng Yu, Lequan
Issue Date	11-Jul-2025
Publisher	Elsevier
Citation	Neural Networks, 2025, v. 191 How to Cite? DOI: http://dx.doi.org/10.1016/j.neunet.2025.107777
Abstract	Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which require access to the weights or latent features from the open-access LLMs, making them difficult to be democratized for everyone as the most advanced LLMs are often closed-source for commercial considerations. To overcome these limitations, we propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs — DemoGraph. Leveraging the text prompt as context-related information, we task the LLM with generating knowledge graphs (KGs), which allow us to capture the structural interactions from the text outputs. We then design a dynamic merging schema to stochastically integrate the LLM-generated KGs into the original graph during training. To control the sparsity of the augmented graph, we further devise a granularity-aware prompting strategy and an instruction fine-tuning module, which seamlessly generates text prompts according to different granularity levels of the dataset. Extensive experiments on various graph learning tasks validate the effectiveness of our method over existing graph data augmentation methods. Notably, our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge, leading to enhanced predictive performance and interpretability.
Persistent Identifier	http://hdl.handle.net/10722/360866
ISSN	0893-6080 2023 Impact Factor: 6.0 2023 SCImago Journal Rankings: 2.605

DC Field	Value	Language
dc.contributor.author	Feng, Yushi	-
dc.contributor.author	Chan, Tsai Hor	-
dc.contributor.author	Yin, Guosheng	-
dc.contributor.author	Yu, Lequan	-
dc.date.accessioned	2025-09-16T00:31:00Z	-
dc.date.available	2025-09-16T00:31:00Z	-
dc.date.issued	2025-07-11	-
dc.identifier.citation	Neural Networks, 2025, v. 191	-
dc.identifier.issn	0893-6080	-
dc.identifier.uri	http://hdl.handle.net/10722/360866	-
dc.description.abstract	<p>Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which require access to the weights or latent features from the open-access LLMs, making them difficult to be democratized for everyone as the most advanced LLMs are often closed-source for commercial considerations. To overcome these limitations, we propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs — <strong>DemoGraph</strong>. Leveraging the text prompt as context-related information, we task the LLM with generating knowledge graphs (KGs), which allow us to capture the structural interactions from the text outputs. We then design a dynamic merging schema to stochastically integrate the LLM-generated KGs into the original graph during training. To control the sparsity of the augmented graph, we further devise a granularity-aware prompting strategy and an instruction fine-tuning module, which seamlessly generates text prompts according to different granularity levels of the dataset. Extensive experiments on various graph learning tasks validate the effectiveness of our method over existing graph data augmentation methods. Notably, our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge, leading to enhanced predictive performance and interpretability.<br></p>	-
dc.language	eng	-
dc.publisher	Elsevier	-
dc.relation.ispartof	Neural Networks	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	Democratizing large language model-based graph data augmentation via latent knowledge graphs	-
dc.type	Article	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1016/j.neunet.2025.107777	-
dc.identifier.volume	191	-
dc.identifier.eissn	1879-2782	-
dc.identifier.issnl	0893-6080	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Democratizing large language model-based graph data augmentation via latent knowledge graphs

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats