Sparse representation and fast processing of massive data

Li, Mingfei.; 李明飞.

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b4961797

Supplementary

Citations:
Appears in Collections:
- Computer Science & Information Systems: Theses
- HKU Theses Online

postgraduate thesis: Sparse representation and fast processing of massive data

Title	Sparse representation and fast processing of massive data
Authors	Li, Mingfei.李明飞.
Advisors	Advisor(s):Chan, HTH
Issue Date	2012
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Li, M. [李明飞]. (2012). Sparse representation and fast processing of massive data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961797
Abstract	Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space. Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator. We also study fault-tolerant spanners in doubling metrics. A subgraph H for a metric space X is called a k-vertex-fault-tolerant t-spanner ((k; t)-VFTS or simply k-VFTS), if for any subset S _ X with \|Sj\|≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S. For any doubling metric, we give a basic construction of k-VFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hop-diameter, which is studied in the context of fault-tolerance for the first time even for Euclidean spanners. We provide a construction of k-VFTS with bounded hop-diameter: for m ≥2n, we can reduce the hop-diameter of the above k-VFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a fault-tolerant single-sink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic k-VFTS. As a result, we get a k-VFTS with O(k^2n) edges and maximum degree O(k^2).
Degree	Master of Philosophy
Subject	Data mining. Sparse matrices.
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/181480
HKU Library Item ID	b4961797

DC Field	Value	Language
dc.contributor.advisor	Chan, HTH	-
dc.contributor.author	Li, Mingfei.	-
dc.contributor.author	李明飞.	-
dc.date.accessioned	2013-03-03T03:19:58Z	-
dc.date.available	2013-03-03T03:19:58Z	-
dc.date.issued	2012	-
dc.identifier.citation	Li, M. [李明飞]. (2012). Sparse representation and fast processing of massive data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961797	-
dc.identifier.uri	http://hdl.handle.net/10722/181480	-
dc.description.abstract	Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space. Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator. We also study fault-tolerant spanners in doubling metrics. A subgraph H for a metric space X is called a k-vertex-fault-tolerant t-spanner ((k; t)-VFTS or simply k-VFTS), if for any subset S _ X with \|Sj\|≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S. For any doubling metric, we give a basic construction of k-VFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hop-diameter, which is studied in the context of fault-tolerance for the first time even for Euclidean spanners. We provide a construction of k-VFTS with bounded hop-diameter: for m ≥2n, we can reduce the hop-diameter of the above k-VFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a fault-tolerant single-sink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic k-VFTS. As a result, we get a k-VFTS with O(k^2n) edges and maximum degree O(k^2).	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.source.uri	http://hub.hku.hk/bib/B49617977	-
dc.subject.lcsh	Data mining.	-
dc.subject.lcsh	Sparse matrices.	-
dc.title	Sparse representation and fast processing of massive data	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b4961797	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b4961797	-
dc.date.hkucongregation	2013	-
dc.identifier.mmsid	991034141229703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Sparse representation and fast processing of massive data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats