File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Efficient recommendation with large-scale matrix data using indexing and approximation
Title | Efficient recommendation with large-scale matrix data using indexing and approximation |
---|---|
Authors | |
Advisors | |
Issue Date | 2019 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Qian, Y. [錢宇秋]. (2019). Efficient recommendation with large-scale matrix data using indexing and approximation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Nowadays we are living in an era of big data with exploding volume and variety. In this thesis, we model various forms of structured and unstructured data as matrices, and study
efficient recommendation problems based on large-scale matrix data. Specifically, we show that building indices and using proper approximation techniques can be of great value in speeding up real recommendation applications.
First, we study the reverse $k$-ranks query as a tool of \emph{graph-based recommender systems}. Given a collection of objects, the reverse $k$-ranks query takes as input a query object $q$ in the set and returns the $k$ objects that rank $q$ higher compared to where other objects rank $q$. To solve this query in the context of graphs, we propose a filter-and-refinement framework, which prunes the search space while traversing the graph in search for the reverse $k$-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs in real applications.
Next, we study meeting point queries (i.e. recommending a place to a group of users) in the context of a \emph{location-aware recommender system}. Such meeting point queries have been studied to take as input both the locations of users and their preference keywords. However, in many applications, users may not explicitly specify any preferences. Motivated by this, we study the scenario of group recommendation for such passive users by inferring their preferences using topic modeling. We propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, which improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods.
Finally, we study the general \emph{nonnegative matrix factorization (NMF)} problem in a distributed setting. NMF is the problem of determining two nonnegative low rank matrices $U$ and $V$, for a given input matrix $M$, such that $M\approx UV\T$.
We propose a framework (called DSANLS) for NMF, which utilizes a matrix sketching (i.e. approximation) technique to reduce the size of nonnegative least squares subproblems in each iteration for $U$ and $V$. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation. |
Degree | Doctor of Philosophy |
Subject | Big data Matrices Approximation algorithms |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/274663 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kao, CM | - |
dc.contributor.advisor | Mamoulis, N | - |
dc.contributor.author | Qian, Yuqiu | - |
dc.contributor.author | 錢宇秋 | - |
dc.date.accessioned | 2019-09-09T07:21:28Z | - |
dc.date.available | 2019-09-09T07:21:28Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Qian, Y. [錢宇秋]. (2019). Efficient recommendation with large-scale matrix data using indexing and approximation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/274663 | - |
dc.description.abstract | Nowadays we are living in an era of big data with exploding volume and variety. In this thesis, we model various forms of structured and unstructured data as matrices, and study efficient recommendation problems based on large-scale matrix data. Specifically, we show that building indices and using proper approximation techniques can be of great value in speeding up real recommendation applications. First, we study the reverse $k$-ranks query as a tool of \emph{graph-based recommender systems}. Given a collection of objects, the reverse $k$-ranks query takes as input a query object $q$ in the set and returns the $k$ objects that rank $q$ higher compared to where other objects rank $q$. To solve this query in the context of graphs, we propose a filter-and-refinement framework, which prunes the search space while traversing the graph in search for the reverse $k$-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs in real applications. Next, we study meeting point queries (i.e. recommending a place to a group of users) in the context of a \emph{location-aware recommender system}. Such meeting point queries have been studied to take as input both the locations of users and their preference keywords. However, in many applications, users may not explicitly specify any preferences. Motivated by this, we study the scenario of group recommendation for such passive users by inferring their preferences using topic modeling. We propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, which improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods. Finally, we study the general \emph{nonnegative matrix factorization (NMF)} problem in a distributed setting. NMF is the problem of determining two nonnegative low rank matrices $U$ and $V$, for a given input matrix $M$, such that $M\approx UV\T$. We propose a framework (called DSANLS) for NMF, which utilizes a matrix sketching (i.e. approximation) technique to reduce the size of nonnegative least squares subproblems in each iteration for $U$ and $V$. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Big data | - |
dc.subject.lcsh | Matrices | - |
dc.subject.lcsh | Approximation algorithms | - |
dc.title | Efficient recommendation with large-scale matrix data using indexing and approximation | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044138427603414 | - |
dc.date.hkucongregation | 2019 | - |
dc.identifier.mmsid | 991044138427603414 | - |