File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Efficient recommendation with large-scale matrix data using indexing and approximation

TitleEfficient recommendation with large-scale matrix data using indexing and approximation
Authors
Advisors
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Qian, Y. [錢宇秋]. (2019). Efficient recommendation with large-scale matrix data using indexing and approximation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractNowadays we are living in an era of big data with exploding volume and variety. In this thesis, we model various forms of structured and unstructured data as matrices, and study efficient recommendation problems based on large-scale matrix data. Specifically, we show that building indices and using proper approximation techniques can be of great value in speeding up real recommendation applications. First, we study the reverse $k$-ranks query as a tool of \emph{graph-based recommender systems}. Given a collection of objects, the reverse $k$-ranks query takes as input a query object $q$ in the set and returns the $k$ objects that rank $q$ higher compared to where other objects rank $q$. To solve this query in the context of graphs, we propose a filter-and-refinement framework, which prunes the search space while traversing the graph in search for the reverse $k$-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs in real applications. Next, we study meeting point queries (i.e. recommending a place to a group of users) in the context of a \emph{location-aware recommender system}. Such meeting point queries have been studied to take as input both the locations of users and their preference keywords. However, in many applications, users may not explicitly specify any preferences. Motivated by this, we study the scenario of group recommendation for such passive users by inferring their preferences using topic modeling. We propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, which improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods. Finally, we study the general \emph{nonnegative matrix factorization (NMF)} problem in a distributed setting. NMF is the problem of determining two nonnegative low rank matrices $U$ and $V$, for a given input matrix $M$, such that $M\approx UV\T$. We propose a framework (called DSANLS) for NMF, which utilizes a matrix sketching (i.e. approximation) technique to reduce the size of nonnegative least squares subproblems in each iteration for $U$ and $V$. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.
DegreeDoctor of Philosophy
SubjectBig data
Matrices
Approximation algorithms
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/274663

 

DC FieldValueLanguage
dc.contributor.advisorKao, CM-
dc.contributor.advisorMamoulis, N-
dc.contributor.authorQian, Yuqiu-
dc.contributor.author錢宇秋-
dc.date.accessioned2019-09-09T07:21:28Z-
dc.date.available2019-09-09T07:21:28Z-
dc.date.issued2019-
dc.identifier.citationQian, Y. [錢宇秋]. (2019). Efficient recommendation with large-scale matrix data using indexing and approximation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/274663-
dc.description.abstractNowadays we are living in an era of big data with exploding volume and variety. In this thesis, we model various forms of structured and unstructured data as matrices, and study efficient recommendation problems based on large-scale matrix data. Specifically, we show that building indices and using proper approximation techniques can be of great value in speeding up real recommendation applications. First, we study the reverse $k$-ranks query as a tool of \emph{graph-based recommender systems}. Given a collection of objects, the reverse $k$-ranks query takes as input a query object $q$ in the set and returns the $k$ objects that rank $q$ higher compared to where other objects rank $q$. To solve this query in the context of graphs, we propose a filter-and-refinement framework, which prunes the search space while traversing the graph in search for the reverse $k$-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs in real applications. Next, we study meeting point queries (i.e. recommending a place to a group of users) in the context of a \emph{location-aware recommender system}. Such meeting point queries have been studied to take as input both the locations of users and their preference keywords. However, in many applications, users may not explicitly specify any preferences. Motivated by this, we study the scenario of group recommendation for such passive users by inferring their preferences using topic modeling. We propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, which improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods. Finally, we study the general \emph{nonnegative matrix factorization (NMF)} problem in a distributed setting. NMF is the problem of determining two nonnegative low rank matrices $U$ and $V$, for a given input matrix $M$, such that $M\approx UV\T$. We propose a framework (called DSANLS) for NMF, which utilizes a matrix sketching (i.e. approximation) technique to reduce the size of nonnegative least squares subproblems in each iteration for $U$ and $V$. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshBig data-
dc.subject.lcshMatrices-
dc.subject.lcshApproximation algorithms-
dc.titleEfficient recommendation with large-scale matrix data using indexing and approximation-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044138427603414-
dc.date.hkucongregation2019-
dc.identifier.mmsid991044138427603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats