File Download
Supplementary

Citations:
 Appears in Collections:
postgraduate thesis: New results on estimating sortedness
Title  New results on estimating sortedness 

Authors  
Advisors  
Issue Date  2011 
Publisher  The University of Hong Kong (Pokfulam, Hong Kong) 
Citation  Pan, J. [潘江伟]. (2011). New results on estimating sortedness. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4716850 
Abstract  Estimating the sortedness of a sequence has found applications in, e.g., sorting algorithms, database management and webpage ranking. As the data volume in many of these applications is massive, recent research has been focusing on estimating sortedness in the data stream model. In this thesis, we extend the study of this problem to a number of directions.
One common measurement of sortedness is the edit distance to monotonicity. Given a stream of items drawn from a totally ordered set, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are nondecreasing. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming wellunderstood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w _ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + _) using O( 1
_2 log2(_w)) space.
We further extend the study in two directions. First, we consider a stream where each item is associated with a value from a partially ordered set. We give a randomized (4+_)approximate algorithm using O( 1_2 log _2w log w) space. Second, we consider an outoforder stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constantapproximate algorithm requires linear space.
Finally, we revisit the classical problem of estimating the length of the longest increasing subsequence (LIS) of a data stream. Previous work shows that any deterministic algorithm requires ?(pN) space through a communication problem HiddenIS, where N is the number of items in the stream. But the randomized space complexity of LIS is open [2]. [23] has given an efficient randomized protocol for HiddenIS, showing that HiddenIS may be significantly easier than LIS. We give an even simpler and more efficient randomized protocol for the HiddenIS problem, indicating that it is unlikely that this communication problem can lead to a polynomial randomized space lower bound for the LIS problem. On the positive side, we propose a new communication problem which we conjecture to be hard enough to lead to a super polylogarithmic randomized space lower bound for the LIS problem. 
Degree  Master of Philosophy 
Subject  Sorting (Electronic computers) 
Dept/Program  Computer Science 
Persistent Identifier  http://hdl.handle.net/10722/174329 
HKU Library Item ID  b4716850 
DC Field  Value  Language 

dc.contributor.advisor  Chan, HL   
dc.contributor.advisor  Lam, TW   
dc.contributor.author  Pan, Jiangwei.   
dc.contributor.author  潘江伟.   
dc.date.issued  2011   
dc.identifier.citation  Pan, J. [潘江伟]. (2011). New results on estimating sortedness. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4716850   
dc.identifier.uri  http://hdl.handle.net/10722/174329   
dc.description.abstract  Estimating the sortedness of a sequence has found applications in, e.g., sorting algorithms, database management and webpage ranking. As the data volume in many of these applications is massive, recent research has been focusing on estimating sortedness in the data stream model. In this thesis, we extend the study of this problem to a number of directions. One common measurement of sortedness is the edit distance to monotonicity. Given a stream of items drawn from a totally ordered set, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are nondecreasing. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming wellunderstood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w _ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + _) using O( 1 _2 log2(_w)) space. We further extend the study in two directions. First, we consider a stream where each item is associated with a value from a partially ordered set. We give a randomized (4+_)approximate algorithm using O( 1_2 log _2w log w) space. Second, we consider an outoforder stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constantapproximate algorithm requires linear space. Finally, we revisit the classical problem of estimating the length of the longest increasing subsequence (LIS) of a data stream. Previous work shows that any deterministic algorithm requires ?(pN) space through a communication problem HiddenIS, where N is the number of items in the stream. But the randomized space complexity of LIS is open [2]. [23] has given an efficient randomized protocol for HiddenIS, showing that HiddenIS may be significantly easier than LIS. We give an even simpler and more efficient randomized protocol for the HiddenIS problem, indicating that it is unlikely that this communication problem can lead to a polynomial randomized space lower bound for the LIS problem. On the positive side, we propose a new communication problem which we conjecture to be hard enough to lead to a super polylogarithmic randomized space lower bound for the LIS problem.   
dc.language  eng   
dc.publisher  The University of Hong Kong (Pokfulam, Hong Kong)   
dc.relation.ispartof  HKU Theses Online (HKUTO)   
dc.rights  The author retains all proprietary rights, (such as patent rights) and the right to use in future works.   
dc.rights  Creative Commons: AttributionNonCommerical 3.0 Hong Kong License   
dc.source.uri  http://hub.hku.hk/bib/B4716850X   
dc.subject.lcsh  Sorting (Electronic computers)   
dc.title  New results on estimating sortedness   
dc.type  PG_Thesis   
dc.identifier.hkul  b4716850   
dc.description.thesisname  Master of Philosophy   
dc.description.thesislevel  Master   
dc.description.thesisdiscipline  Computer Science   
dc.description.nature  published_or_final_version   
dc.identifier.doi  10.5353/th_b4716850   
dc.date.hkucongregation  2012   