Reverse Top-k search using random walk with restart

Yu, Wei; 余韡

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5194753

Supplementary

Citations:
Appears in Collections:
- Computer Science: Theses
- HKU Theses Online

postgraduate thesis: Reverse Top-k search using random walk with restart

Title	Reverse Top-k search using random walk with restart
Authors	Yu, Wei 余韡
Advisors	Advisor(s):Mamoulis, N
Issue Date	2013
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Yu, W. [余韡]. (2013). Reverse Top-k search using random walk with restart. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194753
Abstract	With the increasing popularity of social networking applications, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Nodeto-node proximity is the key building block for many graph based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adapted because of its ability to consider the global structure of the whole network. Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that the direct application of existing methods on RWR-based similarity search to solve reverse top-k queries has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm. In the indexing step, we compute from the graph G a graph index, which is based on a K X \|V\| matrix, containing in each column v the K largest approximate proximity values from v to any other node in G. K is application-dependent and represents the highest value of k in a practical reverse top-k query. At each column v of the index, the approximate values are lower bounds of the K largest proximity values from v to all other nodes. Given the graph index and a reverse top-k query q (k _ K), we prove that the exact proximities from any node v to query q can be efficiently computed by applying the power method. By comparing these with the corresponding lower bounds taken from the k-th row of the graph index, we are able to determine which nodes are certainly not in the reverse top-k result of q. For some of the remaining nodes, we may also be able to determine that they are certainly in the reverse top-k result of q, based on derived upper bounds for the k-th largest proximity value from them. Finally, for any candidate that remains, we progressively refine its approximate proximities, until based on its lower or upper bound it can be determined not to be or to be in the result. The proximities refined during a reverse top-k are used to update the graph index, making its values progressively more accurate for future queries. Our experimental evaluation shows that our technique is efficient and has manageable storage requirements even when applied on very large graphs. We also show the effectiveness of the reverse top-k search in the scenarios of spam detection and determining the popularity of authors.
Degree	Master of Philosophy
Subject	Random walks (Mathematics) Data mining - Graphic methods
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/197515
HKU Library Item ID	b5194753

DC Field	Value	Language
dc.contributor.advisor	Mamoulis, N	-
dc.contributor.author	Yu, Wei	-
dc.contributor.author	余韡	-
dc.date.accessioned	2014-05-27T23:16:40Z	-
dc.date.available	2014-05-27T23:16:40Z	-
dc.date.issued	2013	-
dc.identifier.citation	Yu, W. [余韡]. (2013). Reverse Top-k search using random walk with restart. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194753	-
dc.identifier.uri	http://hdl.handle.net/10722/197515	-
dc.description.abstract	With the increasing popularity of social networking applications, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Nodeto-node proximity is the key building block for many graph based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adapted because of its ability to consider the global structure of the whole network. Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that the direct application of existing methods on RWR-based similarity search to solve reverse top-k queries has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm. In the indexing step, we compute from the graph G a graph index, which is based on a K X \|V\| matrix, containing in each column v the K largest approximate proximity values from v to any other node in G. K is application-dependent and represents the highest value of k in a practical reverse top-k query. At each column v of the index, the approximate values are lower bounds of the K largest proximity values from v to all other nodes. Given the graph index and a reverse top-k query q (k _ K), we prove that the exact proximities from any node v to query q can be efficiently computed by applying the power method. By comparing these with the corresponding lower bounds taken from the k-th row of the graph index, we are able to determine which nodes are certainly not in the reverse top-k result of q. For some of the remaining nodes, we may also be able to determine that they are certainly in the reverse top-k result of q, based on derived upper bounds for the k-th largest proximity value from them. Finally, for any candidate that remains, we progressively refine its approximate proximities, until based on its lower or upper bound it can be determined not to be or to be in the result. The proximities refined during a reverse top-k are used to update the graph index, making its values progressively more accurate for future queries. Our experimental evaluation shows that our technique is efficient and has manageable storage requirements even when applied on very large graphs. We also show the effectiveness of the reverse top-k search in the scenarios of spam detection and determining the popularity of authors.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.subject.lcsh	Random walks (Mathematics)	-
dc.subject.lcsh	Data mining - Graphic methods	-
dc.title	Reverse Top-k search using random walk with restart	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5194753	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5194753	-
dc.identifier.mmsid	991036877839703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Reverse Top-k search using random walk with restart

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats