File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Effective and efficient algorithms for large graph analysis
Title | Effective and efficient algorithms for large graph analysis |
---|---|
Authors | |
Advisors | Advisor(s):Cheng, CK |
Issue Date | 2018 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Hu, J. [胡佳锋]. (2018). Effective and efficient algorithms for large graph analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Graph data are prevalent in communication networks, social media, and biological networks. This rich information source, which describes complex relationships among objects, has attracted a lot of interest from both research and industry communities. In this thesis, we study three important and emerging graph problems over three different graph models: (i) homogeneous graphs, (ii) heterogeneous graphs (or Heterogeneous Information Networks, HINs), and (iii) uncertain graphs.
First, we focus on the community search problem over homogeneous graphs. Given a graph G and a set Q of query nodes, we examine the minimal Steiner Maximum-Connected Subgraph (SMCS) problem. The minimal SMCS essentially is a subgraph graph of G that contains Q with the maximum connectivity. It can be useful for product promotion, customer prediction and team assembling. We propose efficient Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when Q consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches.
Next, we study the discovery of cliques (or complete subgraphs) in HINs. Existing clique-finding solutions often ignore the rich semantics of HINs. We thus propose motif clique, or m-clique, which redefines subgraph completeness with respect to a given motif. A motif, essentially a small subgraph pattern that indicates certain semantics, is a fundamental building block of an HIN. We investigate the maximal m-clique enumeration problem (MMCE), which finds all maximal m-cliques not contained in any other m-cliques. We present the META algorithm, which employs pruning strategies to effectively reduce the search space. We also design fast techniques to avoid generating duplicated maximal m-clique instances. Extensive experiments on large real and synthetic graphs show that META is highly effective and efficient.
Finally, in many situations, graph data are noisy or inexact. These data can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Existing solutions for uncertain graph mining tasks face two problems: (1) high dimensionality – because uncertain graphs are highly complex, it is difficult to analyze them effectively; and (2) low adaptability – current solutions cannot be easily extended to handle a new mining task or a new graph model. To tackle these problems, we propose a solution called URGE. Given an uncertain graph G, URGE generates G’s embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate both low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. Extensive experiments show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments. |
Degree | Doctor of Philosophy |
Subject | Graph algorithms |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/263174 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Cheng, CK | - |
dc.contributor.author | Hu, Jiafeng | - |
dc.contributor.author | 胡佳锋 | - |
dc.date.accessioned | 2018-10-16T07:34:51Z | - |
dc.date.available | 2018-10-16T07:34:51Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Hu, J. [胡佳锋]. (2018). Effective and efficient algorithms for large graph analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/263174 | - |
dc.description.abstract | Graph data are prevalent in communication networks, social media, and biological networks. This rich information source, which describes complex relationships among objects, has attracted a lot of interest from both research and industry communities. In this thesis, we study three important and emerging graph problems over three different graph models: (i) homogeneous graphs, (ii) heterogeneous graphs (or Heterogeneous Information Networks, HINs), and (iii) uncertain graphs. First, we focus on the community search problem over homogeneous graphs. Given a graph G and a set Q of query nodes, we examine the minimal Steiner Maximum-Connected Subgraph (SMCS) problem. The minimal SMCS essentially is a subgraph graph of G that contains Q with the maximum connectivity. It can be useful for product promotion, customer prediction and team assembling. We propose efficient Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when Q consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches. Next, we study the discovery of cliques (or complete subgraphs) in HINs. Existing clique-finding solutions often ignore the rich semantics of HINs. We thus propose motif clique, or m-clique, which redefines subgraph completeness with respect to a given motif. A motif, essentially a small subgraph pattern that indicates certain semantics, is a fundamental building block of an HIN. We investigate the maximal m-clique enumeration problem (MMCE), which finds all maximal m-cliques not contained in any other m-cliques. We present the META algorithm, which employs pruning strategies to effectively reduce the search space. We also design fast techniques to avoid generating duplicated maximal m-clique instances. Extensive experiments on large real and synthetic graphs show that META is highly effective and efficient. Finally, in many situations, graph data are noisy or inexact. These data can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Existing solutions for uncertain graph mining tasks face two problems: (1) high dimensionality – because uncertain graphs are highly complex, it is difficult to analyze them effectively; and (2) low adaptability – current solutions cannot be easily extended to handle a new mining task or a new graph model. To tackle these problems, we propose a solution called URGE. Given an uncertain graph G, URGE generates G’s embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate both low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. Extensive experiments show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Graph algorithms | - |
dc.title | Effective and efficient algorithms for large graph analysis | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044046695503414 | - |
dc.date.hkucongregation | 2018 | - |
dc.identifier.mmsid | 991044046695503414 | - |