File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Effective and efficient algorithms for large graph analysis

TitleEffective and efficient algorithms for large graph analysis
Authors
Advisors
Advisor(s):Cheng, CK
Issue Date2018
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Hu, J. [胡佳锋]. (2018). Effective and efficient algorithms for large graph analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractGraph data are prevalent in communication networks, social media, and biological networks. This rich information source, which describes complex relationships among objects, has attracted a lot of interest from both research and industry communities. In this thesis, we study three important and emerging graph problems over three different graph models: (i) homogeneous graphs, (ii) heterogeneous graphs (or Heterogeneous Information Networks, HINs), and (iii) uncertain graphs. First, we focus on the community search problem over homogeneous graphs. Given a graph G and a set Q of query nodes, we examine the minimal Steiner Maximum-Connected Subgraph (SMCS) problem. The minimal SMCS essentially is a subgraph graph of G that contains Q with the maximum connectivity. It can be useful for product promotion, customer prediction and team assembling. We propose efficient Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when Q consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches. Next, we study the discovery of cliques (or complete subgraphs) in HINs. Existing clique-finding solutions often ignore the rich semantics of HINs. We thus propose motif clique, or m-clique, which redefines subgraph completeness with respect to a given motif. A motif, essentially a small subgraph pattern that indicates certain semantics, is a fundamental building block of an HIN. We investigate the maximal m-clique enumeration problem (MMCE), which finds all maximal m-cliques not contained in any other m-cliques. We present the META algorithm, which employs pruning strategies to effectively reduce the search space. We also design fast techniques to avoid generating duplicated maximal m-clique instances. Extensive experiments on large real and synthetic graphs show that META is highly effective and efficient. Finally, in many situations, graph data are noisy or inexact. These data can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Existing solutions for uncertain graph mining tasks face two problems: (1) high dimensionality – because uncertain graphs are highly complex, it is difficult to analyze them effectively; and (2) low adaptability – current solutions cannot be easily extended to handle a new mining task or a new graph model. To tackle these problems, we propose a solution called URGE. Given an uncertain graph G, URGE generates G’s embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate both low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. Extensive experiments show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments.
DegreeDoctor of Philosophy
SubjectGraph algorithms
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/263174

 

DC FieldValueLanguage
dc.contributor.advisorCheng, CK-
dc.contributor.authorHu, Jiafeng-
dc.contributor.author胡佳锋-
dc.date.accessioned2018-10-16T07:34:51Z-
dc.date.available2018-10-16T07:34:51Z-
dc.date.issued2018-
dc.identifier.citationHu, J. [胡佳锋]. (2018). Effective and efficient algorithms for large graph analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/263174-
dc.description.abstractGraph data are prevalent in communication networks, social media, and biological networks. This rich information source, which describes complex relationships among objects, has attracted a lot of interest from both research and industry communities. In this thesis, we study three important and emerging graph problems over three different graph models: (i) homogeneous graphs, (ii) heterogeneous graphs (or Heterogeneous Information Networks, HINs), and (iii) uncertain graphs. First, we focus on the community search problem over homogeneous graphs. Given a graph G and a set Q of query nodes, we examine the minimal Steiner Maximum-Connected Subgraph (SMCS) problem. The minimal SMCS essentially is a subgraph graph of G that contains Q with the maximum connectivity. It can be useful for product promotion, customer prediction and team assembling. We propose efficient Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when Q consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches. Next, we study the discovery of cliques (or complete subgraphs) in HINs. Existing clique-finding solutions often ignore the rich semantics of HINs. We thus propose motif clique, or m-clique, which redefines subgraph completeness with respect to a given motif. A motif, essentially a small subgraph pattern that indicates certain semantics, is a fundamental building block of an HIN. We investigate the maximal m-clique enumeration problem (MMCE), which finds all maximal m-cliques not contained in any other m-cliques. We present the META algorithm, which employs pruning strategies to effectively reduce the search space. We also design fast techniques to avoid generating duplicated maximal m-clique instances. Extensive experiments on large real and synthetic graphs show that META is highly effective and efficient. Finally, in many situations, graph data are noisy or inexact. These data can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Existing solutions for uncertain graph mining tasks face two problems: (1) high dimensionality – because uncertain graphs are highly complex, it is difficult to analyze them effectively; and (2) low adaptability – current solutions cannot be easily extended to handle a new mining task or a new graph model. To tackle these problems, we propose a solution called URGE. Given an uncertain graph G, URGE generates G’s embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate both low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. Extensive experiments show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshGraph algorithms-
dc.titleEffective and efficient algorithms for large graph analysis-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044046695503414-
dc.date.hkucongregation2018-
dc.identifier.mmsid991044046695503414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats