Effective and efficient algorithms for large graph analysis

Hu, Jiafeng; 胡佳锋

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044046695503414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Effective and efficient algorithms for large graph analysis

Title	Effective and efficient algorithms for large graph analysis
Authors	Hu, Jiafeng 胡佳锋
Advisors	Advisor(s):Cheng, CK
Issue Date	2018
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Hu, J. [胡佳锋]. (2018). Effective and efficient algorithms for large graph analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Graph data are prevalent in communication networks, social media, and biological networks. This rich information source, which describes complex relationships among objects, has attracted a lot of interest from both research and industry communities. In this thesis, we study three important and emerging graph problems over three different graph models: (i) homogeneous graphs, (ii) heterogeneous graphs (or Heterogeneous Information Networks, HINs), and (iii) uncertain graphs. First, we focus on the community search problem over homogeneous graphs. Given a graph G and a set Q of query nodes, we examine the minimal Steiner Maximum-Connected Subgraph (SMCS) problem. The minimal SMCS essentially is a subgraph graph of G that contains Q with the maximum connectivity. It can be useful for product promotion, customer prediction and team assembling. We propose efficient Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when Q consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches. Next, we study the discovery of cliques (or complete subgraphs) in HINs. Existing clique-finding solutions often ignore the rich semantics of HINs. We thus propose motif clique, or m-clique, which redefines subgraph completeness with respect to a given motif. A motif, essentially a small subgraph pattern that indicates certain semantics, is a fundamental building block of an HIN. We investigate the maximal m-clique enumeration problem (MMCE), which finds all maximal m-cliques not contained in any other m-cliques. We present the META algorithm, which employs pruning strategies to effectively reduce the search space. We also design fast techniques to avoid generating duplicated maximal m-clique instances. Extensive experiments on large real and synthetic graphs show that META is highly effective and efficient. Finally, in many situations, graph data are noisy or inexact. These data can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Existing solutions for uncertain graph mining tasks face two problems: (1) high dimensionality – because uncertain graphs are highly complex, it is difficult to analyze them effectively; and (2) low adaptability – current solutions cannot be easily extended to handle a new mining task or a new graph model. To tackle these problems, we propose a solution called URGE. Given an uncertain graph G, URGE generates G’s embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate both low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. Extensive experiments show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments.
Degree	Doctor of Philosophy
Subject	Graph algorithms
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/263174

DC Field	Value	Language
dc.contributor.advisor	Cheng, CK	-
dc.contributor.author	Hu, Jiafeng	-
dc.contributor.author	胡佳锋	-
dc.date.accessioned	2018-10-16T07:34:51Z	-
dc.date.available	2018-10-16T07:34:51Z	-
dc.date.issued	2018	-
dc.identifier.citation	Hu, J. [胡佳锋]. (2018). Effective and efficient algorithms for large graph analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/263174	-
dc.description.abstract	Graph data are prevalent in communication networks, social media, and biological networks. This rich information source, which describes complex relationships among objects, has attracted a lot of interest from both research and industry communities. In this thesis, we study three important and emerging graph problems over three different graph models: (i) homogeneous graphs, (ii) heterogeneous graphs (or Heterogeneous Information Networks, HINs), and (iii) uncertain graphs. First, we focus on the community search problem over homogeneous graphs. Given a graph G and a set Q of query nodes, we examine the minimal Steiner Maximum-Connected Subgraph (SMCS) problem. The minimal SMCS essentially is a subgraph graph of G that contains Q with the maximum connectivity. It can be useful for product promotion, customer prediction and team assembling. We propose efficient Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when Q consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches. Next, we study the discovery of cliques (or complete subgraphs) in HINs. Existing clique-finding solutions often ignore the rich semantics of HINs. We thus propose motif clique, or m-clique, which redefines subgraph completeness with respect to a given motif. A motif, essentially a small subgraph pattern that indicates certain semantics, is a fundamental building block of an HIN. We investigate the maximal m-clique enumeration problem (MMCE), which finds all maximal m-cliques not contained in any other m-cliques. We present the META algorithm, which employs pruning strategies to effectively reduce the search space. We also design fast techniques to avoid generating duplicated maximal m-clique instances. Extensive experiments on large real and synthetic graphs show that META is highly effective and efficient. Finally, in many situations, graph data are noisy or inexact. These data can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Existing solutions for uncertain graph mining tasks face two problems: (1) high dimensionality – because uncertain graphs are highly complex, it is difficult to analyze them effectively; and (2) low adaptability – current solutions cannot be easily extended to handle a new mining task or a new graph model. To tackle these problems, we propose a solution called URGE. Given an uncertain graph G, URGE generates G’s embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate both low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. Extensive experiments show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Graph algorithms	-
dc.title	Effective and efficient algorithms for large graph analysis	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044046695503414	-
dc.date.hkucongregation	2018	-
dc.identifier.mmsid	991044046695503414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Effective and efficient algorithms for large graph analysis

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats