File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Sparse representation and fast processing of massive data
Title | Sparse representation and fast processing of massive data |
---|---|
Authors | |
Advisors | Advisor(s):Chan, HTH |
Issue Date | 2012 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Li, M. [李明飞]. (2012). Sparse representation and fast processing of massive data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961797 |
Abstract | Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space.
Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator.
We also study fault-tolerant spanners in doubling metrics. A subgraph H for a metric space X is called a k-vertex-fault-tolerant t-spanner ((k; t)-VFTS or simply k-VFTS), if for any subset S _ X with |Sj|≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S.
For any doubling metric, we give a basic construction of k-VFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hop-diameter, which is studied in the context of fault-tolerance for the first time even for Euclidean spanners. We provide a construction of k-VFTS with bounded hop-diameter: for m ≥2n, we can reduce the hop-diameter of the above k-VFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a fault-tolerant single-sink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic k-VFTS. As a result, we get a k-VFTS with O(k^2n) edges and maximum degree O(k^2). |
Degree | Master of Philosophy |
Subject | Data mining. Sparse matrices. |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/181480 |
HKU Library Item ID | b4961797 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Chan, HTH | - |
dc.contributor.author | Li, Mingfei. | - |
dc.contributor.author | 李明飞. | - |
dc.date.accessioned | 2013-03-03T03:19:58Z | - |
dc.date.available | 2013-03-03T03:19:58Z | - |
dc.date.issued | 2012 | - |
dc.identifier.citation | Li, M. [李明飞]. (2012). Sparse representation and fast processing of massive data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961797 | - |
dc.identifier.uri | http://hdl.handle.net/10722/181480 | - |
dc.description.abstract | Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space. Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator. We also study fault-tolerant spanners in doubling metrics. A subgraph H for a metric space X is called a k-vertex-fault-tolerant t-spanner ((k; t)-VFTS or simply k-VFTS), if for any subset S _ X with |Sj|≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S. For any doubling metric, we give a basic construction of k-VFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hop-diameter, which is studied in the context of fault-tolerance for the first time even for Euclidean spanners. We provide a construction of k-VFTS with bounded hop-diameter: for m ≥2n, we can reduce the hop-diameter of the above k-VFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a fault-tolerant single-sink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic k-VFTS. As a result, we get a k-VFTS with O(k^2n) edges and maximum degree O(k^2). | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.source.uri | http://hub.hku.hk/bib/B49617977 | - |
dc.subject.lcsh | Data mining. | - |
dc.subject.lcsh | Sparse matrices. | - |
dc.title | Sparse representation and fast processing of massive data | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b4961797 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b4961797 | - |
dc.date.hkucongregation | 2013 | - |
dc.identifier.mmsid | 991034141229703414 | - |