File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Evaluating multi-way joins over discounted hitting time
Title | Evaluating multi-way joins over discounted hitting time |
---|---|
Authors | |
Advisors | |
Issue Date | 2013 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhang, W. [张望达]. (2013). Evaluating multi-way joins over discounted hitting time. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5177353 |
Abstract | The prevalence of graphs in emerging applications has recently raised a lot of research interests. To acquire interesting information hidden in large graphs, tasks including link prediction, collaborative recommendation, and reputation ranking, all make use of proximities between graph nodes. The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, has shown to be useful in various applications. In this thesis, we examine a novel query, called the multi-way join (or n-way join), over DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a ranked list of n-tuples with the k highest scores, according to some aggregation function of DHT values. By extracting such top-k results, this query enables the analysis and prediction of various complex relationships among n sets of nodes on a large graph.
Since an n-way join is expensive to evaluate, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since the process of PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which saves the trouble of recomputation and allows the results of top-(m+1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. For better performance, we further examine efficient processing algorithms and pruning techniques for 2-way joins. Through extensive experiments on three real graph datasets, we show that the proposed PJ algorithm accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions. |
Degree | Master of Philosophy |
Subject | Query languages (Computer science) Graph algorithms |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/196484 |
HKU Library Item ID | b5177353 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kao, CM | - |
dc.contributor.advisor | Cheng, CK | - |
dc.contributor.author | Zhang, Wangda | - |
dc.contributor.author | 张望达 | - |
dc.date.accessioned | 2014-04-11T23:14:30Z | - |
dc.date.available | 2014-04-11T23:14:30Z | - |
dc.date.issued | 2013 | - |
dc.identifier.citation | Zhang, W. [张望达]. (2013). Evaluating multi-way joins over discounted hitting time. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5177353 | - |
dc.identifier.uri | http://hdl.handle.net/10722/196484 | - |
dc.description.abstract | The prevalence of graphs in emerging applications has recently raised a lot of research interests. To acquire interesting information hidden in large graphs, tasks including link prediction, collaborative recommendation, and reputation ranking, all make use of proximities between graph nodes. The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, has shown to be useful in various applications. In this thesis, we examine a novel query, called the multi-way join (or n-way join), over DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a ranked list of n-tuples with the k highest scores, according to some aggregation function of DHT values. By extracting such top-k results, this query enables the analysis and prediction of various complex relationships among n sets of nodes on a large graph. Since an n-way join is expensive to evaluate, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since the process of PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which saves the trouble of recomputation and allows the results of top-(m+1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. For better performance, we further examine efficient processing algorithms and pruning techniques for 2-way joins. Through extensive experiments on three real graph datasets, we show that the proposed PJ algorithm accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Query languages (Computer science) | - |
dc.subject.lcsh | Graph algorithms | - |
dc.title | Evaluating multi-way joins over discounted hitting time | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5177353 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5177353 | - |
dc.identifier.mmsid | 991036763379703414 | - |