File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Reverse Top-k search using random walk with restart

TitleReverse Top-k search using random walk with restart
Authors
Advisors
Advisor(s):Mamoulis, N
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Yu, W. [余韡]. (2013). Reverse Top-k search using random walk with restart. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194753
AbstractWith the increasing popularity of social networking applications, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Nodeto-node proximity is the key building block for many graph based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adapted because of its ability to consider the global structure of the whole network. Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that the direct application of existing methods on RWR-based similarity search to solve reverse top-k queries has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm. In the indexing step, we compute from the graph G a graph index, which is based on a K X |V| matrix, containing in each column v the K largest approximate proximity values from v to any other node in G. K is application-dependent and represents the highest value of k in a practical reverse top-k query. At each column v of the index, the approximate values are lower bounds of the K largest proximity values from v to all other nodes. Given the graph index and a reverse top-k query q (k _ K), we prove that the exact proximities from any node v to query q can be efficiently computed by applying the power method. By comparing these with the corresponding lower bounds taken from the k-th row of the graph index, we are able to determine which nodes are certainly not in the reverse top-k result of q. For some of the remaining nodes, we may also be able to determine that they are certainly in the reverse top-k result of q, based on derived upper bounds for the k-th largest proximity value from them. Finally, for any candidate that remains, we progressively refine its approximate proximities, until based on its lower or upper bound it can be determined not to be or to be in the result. The proximities refined during a reverse top-k are used to update the graph index, making its values progressively more accurate for future queries. Our experimental evaluation shows that our technique is efficient and has manageable storage requirements even when applied on very large graphs. We also show the effectiveness of the reverse top-k search in the scenarios of spam detection and determining the popularity of authors.
DegreeMaster of Philosophy
SubjectRandom walks (Mathematics)
Data mining - Graphic methods
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/197515
HKU Library Item IDb5194753

 

DC FieldValueLanguage
dc.contributor.advisorMamoulis, N-
dc.contributor.authorYu, Wei-
dc.contributor.author余韡-
dc.date.accessioned2014-05-27T23:16:40Z-
dc.date.available2014-05-27T23:16:40Z-
dc.date.issued2013-
dc.identifier.citationYu, W. [余韡]. (2013). Reverse Top-k search using random walk with restart. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194753-
dc.identifier.urihttp://hdl.handle.net/10722/197515-
dc.description.abstractWith the increasing popularity of social networking applications, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Nodeto-node proximity is the key building block for many graph based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adapted because of its ability to consider the global structure of the whole network. Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that the direct application of existing methods on RWR-based similarity search to solve reverse top-k queries has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm. In the indexing step, we compute from the graph G a graph index, which is based on a K X |V| matrix, containing in each column v the K largest approximate proximity values from v to any other node in G. K is application-dependent and represents the highest value of k in a practical reverse top-k query. At each column v of the index, the approximate values are lower bounds of the K largest proximity values from v to all other nodes. Given the graph index and a reverse top-k query q (k _ K), we prove that the exact proximities from any node v to query q can be efficiently computed by applying the power method. By comparing these with the corresponding lower bounds taken from the k-th row of the graph index, we are able to determine which nodes are certainly not in the reverse top-k result of q. For some of the remaining nodes, we may also be able to determine that they are certainly in the reverse top-k result of q, based on derived upper bounds for the k-th largest proximity value from them. Finally, for any candidate that remains, we progressively refine its approximate proximities, until based on its lower or upper bound it can be determined not to be or to be in the result. The proximities refined during a reverse top-k are used to update the graph index, making its values progressively more accurate for future queries. Our experimental evaluation shows that our technique is efficient and has manageable storage requirements even when applied on very large graphs. We also show the effectiveness of the reverse top-k search in the scenarios of spam detection and determining the popularity of authors.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshRandom walks (Mathematics)-
dc.subject.lcshData mining - Graphic methods-
dc.titleReverse Top-k search using random walk with restart-
dc.typePG_Thesis-
dc.identifier.hkulb5194753-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5194753-
dc.identifier.mmsid991036877839703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats