Efficient recommendation with large-scale matrix data using indexing and approximation

Qian, Yuqiu; 錢宇秋

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044138427603414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Efficient recommendation with large-scale matrix data using indexing and approximation

Title	Efficient recommendation with large-scale matrix data using indexing and approximation
Authors	Qian, Yuqiu 錢宇秋
Advisors	Advisor(s):Kao, CM Mamoulis, N
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Qian, Y. [錢宇秋]. (2019). Efficient recommendation with large-scale matrix data using indexing and approximation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Nowadays we are living in an era of big data with exploding volume and variety. In this thesis, we model various forms of structured and unstructured data as matrices, and study efficient recommendation problems based on large-scale matrix data. Specifically, we show that building indices and using proper approximation techniques can be of great value in speeding up real recommendation applications. First, we study the reverse $k$-ranks query as a tool of \emph{graph-based recommender systems}. Given a collection of objects, the reverse $k$-ranks query takes as input a query object $q$ in the set and returns the $k$ objects that rank $q$ higher compared to where other objects rank $q$. To solve this query in the context of graphs, we propose a filter-and-refinement framework, which prunes the search space while traversing the graph in search for the reverse $k$-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs in real applications. Next, we study meeting point queries (i.e. recommending a place to a group of users) in the context of a \emph{location-aware recommender system}. Such meeting point queries have been studied to take as input both the locations of users and their preference keywords. However, in many applications, users may not explicitly specify any preferences. Motivated by this, we study the scenario of group recommendation for such passive users by inferring their preferences using topic modeling. We propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, which improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods. Finally, we study the general \emph{nonnegative matrix factorization (NMF)} problem in a distributed setting. NMF is the problem of determining two nonnegative low rank matrices $U$ and $V$, for a given input matrix $M$, such that $M\approx UV\T$. We propose a framework (called DSANLS) for NMF, which utilizes a matrix sketching (i.e. approximation) technique to reduce the size of nonnegative least squares subproblems in each iteration for $U$ and $V$. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.
Degree	Doctor of Philosophy
Subject	Big data Matrices Approximation algorithms
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/274663

DC Field	Value	Language
dc.contributor.advisor	Kao, CM	-
dc.contributor.advisor	Mamoulis, N	-
dc.contributor.author	Qian, Yuqiu	-
dc.contributor.author	錢宇秋	-
dc.date.accessioned	2019-09-09T07:21:28Z	-
dc.date.available	2019-09-09T07:21:28Z	-
dc.date.issued	2019	-
dc.identifier.citation	Qian, Y. [錢宇秋]. (2019). Efficient recommendation with large-scale matrix data using indexing and approximation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/274663	-
dc.description.abstract	Nowadays we are living in an era of big data with exploding volume and variety. In this thesis, we model various forms of structured and unstructured data as matrices, and study efficient recommendation problems based on large-scale matrix data. Specifically, we show that building indices and using proper approximation techniques can be of great value in speeding up real recommendation applications. First, we study the reverse $k$-ranks query as a tool of \emph{graph-based recommender systems}. Given a collection of objects, the reverse $k$-ranks query takes as input a query object $q$ in the set and returns the $k$ objects that rank $q$ higher compared to where other objects rank $q$. To solve this query in the context of graphs, we propose a filter-and-refinement framework, which prunes the search space while traversing the graph in search for the reverse $k$-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs in real applications. Next, we study meeting point queries (i.e. recommending a place to a group of users) in the context of a \emph{location-aware recommender system}. Such meeting point queries have been studied to take as input both the locations of users and their preference keywords. However, in many applications, users may not explicitly specify any preferences. Motivated by this, we study the scenario of group recommendation for such passive users by inferring their preferences using topic modeling. We propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, which improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods. Finally, we study the general \emph{nonnegative matrix factorization (NMF)} problem in a distributed setting. NMF is the problem of determining two nonnegative low rank matrices $U$ and $V$, for a given input matrix $M$, such that $M\approx UV\T$. We propose a framework (called DSANLS) for NMF, which utilizes a matrix sketching (i.e. approximation) technique to reduce the size of nonnegative least squares subproblems in each iteration for $U$ and $V$. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Big data	-
dc.subject.lcsh	Matrices	-
dc.subject.lcsh	Approximation algorithms	-
dc.title	Efficient recommendation with large-scale matrix data using indexing and approximation	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044138427603414	-
dc.date.hkucongregation	2019	-
dc.identifier.mmsid	991044138427603414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Efficient recommendation with large-scale matrix data using indexing and approximation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats