File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: New results on estimating sortedness
Title | New results on estimating sortedness |
---|---|
Authors | |
Advisors | |
Issue Date | 2011 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Pan, J. [潘江伟]. (2011). New results on estimating sortedness. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4716850 |
Abstract | Estimating the sortedness of a sequence has found applications in, e.g., sorting algorithms, database management and webpage ranking. As the data volume in many of these applications is massive, recent research has been focusing on estimating sortedness in the data stream model. In this thesis, we extend the study of this problem to a number of directions.
One common measurement of sortedness is the edit distance to monotonicity. Given a stream of items drawn from a totally ordered set, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w _ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + _) using O( 1
_2 log2(_w)) space.
We further extend the study in two directions. First, we consider a stream where each item is associated with a value from a partially ordered set. We give a randomized (4+_)-approximate algorithm using O( 1_2 log _2w log w) space. Second, we consider an out-of-order stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constant-approximate algorithm requires linear space.
Finally, we revisit the classical problem of estimating the length of the longest increasing subsequence (LIS) of a data stream. Previous work shows that any deterministic algorithm requires ?(pN) space through a communication problem Hidden-IS, where N is the number of items in the stream. But the randomized space complexity of LIS is open [2]. [23] has given an efficient randomized protocol for Hidden-IS, showing that Hidden-IS may be significantly easier than LIS. We give an even simpler and more efficient randomized protocol for the Hidden-IS problem, indicating that it is unlikely that this communication problem can lead to a polynomial randomized space lower bound for the LIS problem. On the positive side, we propose a new communication problem which we conjecture to be hard enough to lead to a super polylogarithmic randomized space lower bound for the LIS problem. |
Degree | Master of Philosophy |
Subject | Sorting (Electronic computers) |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/174329 |
HKU Library Item ID | b4716850 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Chan, HL | - |
dc.contributor.advisor | Lam, TW | - |
dc.contributor.author | Pan, Jiangwei. | - |
dc.contributor.author | 潘江伟. | - |
dc.date.issued | 2011 | - |
dc.identifier.citation | Pan, J. [潘江伟]. (2011). New results on estimating sortedness. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4716850 | - |
dc.identifier.uri | http://hdl.handle.net/10722/174329 | - |
dc.description.abstract | Estimating the sortedness of a sequence has found applications in, e.g., sorting algorithms, database management and webpage ranking. As the data volume in many of these applications is massive, recent research has been focusing on estimating sortedness in the data stream model. In this thesis, we extend the study of this problem to a number of directions. One common measurement of sortedness is the edit distance to monotonicity. Given a stream of items drawn from a totally ordered set, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w _ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + _) using O( 1 _2 log2(_w)) space. We further extend the study in two directions. First, we consider a stream where each item is associated with a value from a partially ordered set. We give a randomized (4+_)-approximate algorithm using O( 1_2 log _2w log w) space. Second, we consider an out-of-order stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constant-approximate algorithm requires linear space. Finally, we revisit the classical problem of estimating the length of the longest increasing subsequence (LIS) of a data stream. Previous work shows that any deterministic algorithm requires ?(pN) space through a communication problem Hidden-IS, where N is the number of items in the stream. But the randomized space complexity of LIS is open [2]. [23] has given an efficient randomized protocol for Hidden-IS, showing that Hidden-IS may be significantly easier than LIS. We give an even simpler and more efficient randomized protocol for the Hidden-IS problem, indicating that it is unlikely that this communication problem can lead to a polynomial randomized space lower bound for the LIS problem. On the positive side, we propose a new communication problem which we conjecture to be hard enough to lead to a super polylogarithmic randomized space lower bound for the LIS problem. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.source.uri | http://hub.hku.hk/bib/B4716850X | - |
dc.subject.lcsh | Sorting (Electronic computers) | - |
dc.title | New results on estimating sortedness | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b4716850 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b4716850 | - |
dc.date.hkucongregation | 2012 | - |
dc.identifier.mmsid | 991032833259703414 | - |