A study on privacy-preserving distributed graph mining

Zhang, Ke; 张可

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: A study on privacy-preserving distributed graph mining

Title	A study on privacy-preserving distributed graph mining
Authors	Zhang, Ke 张可
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhang, K. [张可]. (2022). A study on privacy-preserving distributed graph mining. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Nowadays, graph data is distributively generated, collected, organized, and preserved by multiple data owners. In this thesis, we consider a novel yet realistic scenario where each local system holds a small subgraph that may be biased from the distribution of the entire global graph. Due to data privacy concerns and interest conflicts, locally stored subgraphs cannot be directly shared with the public or among data owners. Thus, it is natural to consider federated learning (FL) across distributed subgraphs. Unlike distributed text or image data, whose data samples are independent of each other during predictions, the data samples (nodes) on graphs are correlated when conducting graph learning tasks. Thus, applying FL to training a graph neural network (GNN) across local subgraphs has unique challenges in achieving effectiveness and privacy simultaneously. This thesis studies the privacy-preserving graph learning methods from distributed collaboration and the solely training aspect. We first consider the distributed homogeneous subgraph system, where each graph only contains a single type of nodes and edges. To enable distributed data owners to conduct FL on graph data, we propose the FedSage model, which trains a GraphSage model based on FedAvg to integrate local subgraph information. To overcome the performance deterioration brought by missing links across local subgraphs, we propose FedSage+, which trains a missing neighbor generator along FedSage. Next, we consider a more complex scenario where the global graph is a heterogeneous graph (heterograph) containing multiple types of nodes and links. To better simulate realistic applications, we incorporate privacy considerations by categorizing nodes into private and public nodes. Specifically, sharing private nodes is restricted. We propose two major techniques: (1) FedHG, which trains a type-aware GCN model using a sample-based normalization over FedAvg to integrate local heterographs; (2) FedHG+, which jointly trains a type-aware missing neighbor generator with the type-aware GCN to deal with incomplete local heterogeneous neighborhoods. Though FL claims to be private by protecting raw data from being shared, FL still faces criticism over its actual privacy for the gradients sharing along the collaboration. We then take a gentle step in exploring the privacy-preserving collaboration among data owners. Instead of requiring sensitive gradient data across the system, we propose a light-weight secure aggregation method SC-AGG. It only harnesses each distributed model as a black box and trains a global model by adaptively aggregating local models' inference results. Without casting constraints on local models' structures or the local data distributions, SC-AGG shows promising empirical results in image classification tasks. Yet the emphasis of the above technique is on privacy during the collaboration process. For an individual data owner, we locally facilitate rigorous privacy protections on the training graph, especially the relational data, by resorting to the differential privacy (DP) framework. We formulate and enforce privacy constraints, i.e., edge differential privacy (edge-DP), on deep graph generation models. Specifically, we inject Gaussian noise to the gradients of a link reconstruction-based graph generation model and simultaneously ensure the data utility by improving structure learning with structure-oriented graph comparison.
Degree	Doctor of Philosophy
Subject	Graph theory Data mining Data protection Privacy
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/325826

DC Field	Value	Language
dc.contributor.author	Zhang, Ke	-
dc.contributor.author	张可	-
dc.date.accessioned	2023-03-02T16:33:08Z	-
dc.date.available	2023-03-02T16:33:08Z	-
dc.date.issued	2022	-
dc.identifier.citation	Zhang, K. [张可]. (2022). A study on privacy-preserving distributed graph mining. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/325826	-
dc.description.abstract	Nowadays, graph data is distributively generated, collected, organized, and preserved by multiple data owners. In this thesis, we consider a novel yet realistic scenario where each local system holds a small subgraph that may be biased from the distribution of the entire global graph. Due to data privacy concerns and interest conflicts, locally stored subgraphs cannot be directly shared with the public or among data owners. Thus, it is natural to consider federated learning (FL) across distributed subgraphs. Unlike distributed text or image data, whose data samples are independent of each other during predictions, the data samples (nodes) on graphs are correlated when conducting graph learning tasks. Thus, applying FL to training a graph neural network (GNN) across local subgraphs has unique challenges in achieving effectiveness and privacy simultaneously. This thesis studies the privacy-preserving graph learning methods from distributed collaboration and the solely training aspect. We first consider the distributed homogeneous subgraph system, where each graph only contains a single type of nodes and edges. To enable distributed data owners to conduct FL on graph data, we propose the FedSage model, which trains a GraphSage model based on FedAvg to integrate local subgraph information. To overcome the performance deterioration brought by missing links across local subgraphs, we propose FedSage+, which trains a missing neighbor generator along FedSage. Next, we consider a more complex scenario where the global graph is a heterogeneous graph (heterograph) containing multiple types of nodes and links. To better simulate realistic applications, we incorporate privacy considerations by categorizing nodes into private and public nodes. Specifically, sharing private nodes is restricted. We propose two major techniques: (1) FedHG, which trains a type-aware GCN model using a sample-based normalization over FedAvg to integrate local heterographs; (2) FedHG+, which jointly trains a type-aware missing neighbor generator with the type-aware GCN to deal with incomplete local heterogeneous neighborhoods. Though FL claims to be private by protecting raw data from being shared, FL still faces criticism over its actual privacy for the gradients sharing along the collaboration. We then take a gentle step in exploring the privacy-preserving collaboration among data owners. Instead of requiring sensitive gradient data across the system, we propose a light-weight secure aggregation method SC-AGG. It only harnesses each distributed model as a black box and trains a global model by adaptively aggregating local models' inference results. Without casting constraints on local models' structures or the local data distributions, SC-AGG shows promising empirical results in image classification tasks. Yet the emphasis of the above technique is on privacy during the collaboration process. For an individual data owner, we locally facilitate rigorous privacy protections on the training graph, especially the relational data, by resorting to the differential privacy (DP) framework. We formulate and enforce privacy constraints, i.e., edge differential privacy (edge-DP), on deep graph generation models. Specifically, we inject Gaussian noise to the gradients of a link reconstruction-based graph generation model and simultaneously ensure the data utility by improving structure learning with structure-oriented graph comparison.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Graph theory	-
dc.subject.lcsh	Data mining	-
dc.subject.lcsh	Data protection	-
dc.subject.lcsh	Privacy	-
dc.title	A study on privacy-preserving distributed graph mining	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044649899203414	-

File Download

Supplementary

postgraduate thesis: A study on privacy-preserving distributed graph mining

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats