File Download
Supplementary

Citations:
 Appears in Collections:
postgraduate thesis: Sparse representation and fast processing of massive data
Title  Sparse representation and fast processing of massive data 

Authors  
Advisors  Advisor(s):Chan, HTH 
Issue Date  2012 
Publisher  The University of Hong Kong (Pokfulam, Hong Kong) 
Citation  Li, M. [李明飞]. (2012). Sparse representation and fast processing of massive data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961797 
Abstract  Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space.
Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavyhitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator.
We also study faulttolerant spanners in doubling metrics. A subgraph H for a metric space X is called a kvertexfaulttolerant tspanner ((k; t)VFTS or simply kVFTS), if for any subset S _ X with Sj≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S.
For any doubling metric, we give a basic construction of kVFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hopdiameter, which is studied in the context of faulttolerance for the first time even for Euclidean spanners. We provide a construction of kVFTS with bounded hopdiameter: for m ≥2n, we can reduce the hopdiameter of the above kVFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a faulttolerant singlesink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic kVFTS. As a result, we get a kVFTS with O(k^2n) edges and maximum degree O(k^2). 
Degree  Master of Philosophy 
Subject  Data mining. Sparse matrices. 
Dept/Program  Computer Science 
Persistent Identifier  http://hdl.handle.net/10722/181480 
HKU Library Item ID  b4961797 
DC Field  Value  Language 

dc.contributor.advisor  Chan, HTH   
dc.contributor.author  Li, Mingfei.   
dc.contributor.author  李明飞.   
dc.date.accessioned  20130303T03:19:58Z   
dc.date.available  20130303T03:19:58Z   
dc.date.issued  2012   
dc.identifier.citation  Li, M. [李明飞]. (2012). Sparse representation and fast processing of massive data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961797   
dc.identifier.uri  http://hdl.handle.net/10722/181480   
dc.description.abstract  Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space. Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavyhitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator. We also study faulttolerant spanners in doubling metrics. A subgraph H for a metric space X is called a kvertexfaulttolerant tspanner ((k; t)VFTS or simply kVFTS), if for any subset S _ X with Sj≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S. For any doubling metric, we give a basic construction of kVFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hopdiameter, which is studied in the context of faulttolerance for the first time even for Euclidean spanners. We provide a construction of kVFTS with bounded hopdiameter: for m ≥2n, we can reduce the hopdiameter of the above kVFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a faulttolerant singlesink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic kVFTS. As a result, we get a kVFTS with O(k^2n) edges and maximum degree O(k^2).   
dc.language  eng   
dc.publisher  The University of Hong Kong (Pokfulam, Hong Kong)   
dc.relation.ispartof  HKU Theses Online (HKUTO)   
dc.rights  The author retains all proprietary rights, (such as patent rights) and the right to use in future works.   
dc.rights  This work is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License.   
dc.source.uri  http://hub.hku.hk/bib/B49617977   
dc.subject.lcsh  Data mining.   
dc.subject.lcsh  Sparse matrices.   
dc.title  Sparse representation and fast processing of massive data   
dc.type  PG_Thesis   
dc.identifier.hkul  b4961797   
dc.description.thesisname  Master of Philosophy   
dc.description.thesislevel  Master   
dc.description.thesisdiscipline  Computer Science   
dc.description.nature  published_or_final_version   
dc.identifier.doi  10.5353/th_b4961797   
dc.date.hkucongregation  2013   