File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: New results on estimating sortedness

TitleNew results on estimating sortedness
Authors
Advisors
Advisor(s):Chan, HLLam, TW
Issue Date2011
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Pan, J. [潘江伟]. (2011). New results on estimating sortedness. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4716850
AbstractEstimating the sortedness of a sequence has found applications in, e.g., sorting algorithms, database management and webpage ranking. As the data volume in many of these applications is massive, recent research has been focusing on estimating sortedness in the data stream model. In this thesis, we extend the study of this problem to a number of directions. One common measurement of sortedness is the edit distance to monotonicity. Given a stream of items drawn from a totally ordered set, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w _ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + _) using O( 1 _2 log2(_w)) space. We further extend the study in two directions. First, we consider a stream where each item is associated with a value from a partially ordered set. We give a randomized (4+_)-approximate algorithm using O( 1_2 log _2w log w) space. Second, we consider an out-of-order stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constant-approximate algorithm requires linear space. Finally, we revisit the classical problem of estimating the length of the longest increasing subsequence (LIS) of a data stream. Previous work shows that any deterministic algorithm requires ?(pN) space through a communication problem Hidden-IS, where N is the number of items in the stream. But the randomized space complexity of LIS is open [2]. [23] has given an efficient randomized protocol for Hidden-IS, showing that Hidden-IS may be significantly easier than LIS. We give an even simpler and more efficient randomized protocol for the Hidden-IS problem, indicating that it is unlikely that this communication problem can lead to a polynomial randomized space lower bound for the LIS problem. On the positive side, we propose a new communication problem which we conjecture to be hard enough to lead to a super polylogarithmic randomized space lower bound for the LIS problem.
DegreeMaster of Philosophy
SubjectSorting (Electronic computers)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/174329
HKU Library Item IDb4716850

 

DC FieldValueLanguage
dc.contributor.advisorChan, HL-
dc.contributor.advisorLam, TW-
dc.contributor.authorPan, Jiangwei.-
dc.contributor.author潘江伟.-
dc.date.issued2011-
dc.identifier.citationPan, J. [潘江伟]. (2011). New results on estimating sortedness. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4716850-
dc.identifier.urihttp://hdl.handle.net/10722/174329-
dc.description.abstractEstimating the sortedness of a sequence has found applications in, e.g., sorting algorithms, database management and webpage ranking. As the data volume in many of these applications is massive, recent research has been focusing on estimating sortedness in the data stream model. In this thesis, we extend the study of this problem to a number of directions. One common measurement of sortedness is the edit distance to monotonicity. Given a stream of items drawn from a totally ordered set, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w _ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + _) using O( 1 _2 log2(_w)) space. We further extend the study in two directions. First, we consider a stream where each item is associated with a value from a partially ordered set. We give a randomized (4+_)-approximate algorithm using O( 1_2 log _2w log w) space. Second, we consider an out-of-order stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constant-approximate algorithm requires linear space. Finally, we revisit the classical problem of estimating the length of the longest increasing subsequence (LIS) of a data stream. Previous work shows that any deterministic algorithm requires ?(pN) space through a communication problem Hidden-IS, where N is the number of items in the stream. But the randomized space complexity of LIS is open [2]. [23] has given an efficient randomized protocol for Hidden-IS, showing that Hidden-IS may be significantly easier than LIS. We give an even simpler and more efficient randomized protocol for the Hidden-IS problem, indicating that it is unlikely that this communication problem can lead to a polynomial randomized space lower bound for the LIS problem. On the positive side, we propose a new communication problem which we conjecture to be hard enough to lead to a super polylogarithmic randomized space lower bound for the LIS problem.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.source.urihttp://hub.hku.hk/bib/B4716850X-
dc.subject.lcshSorting (Electronic computers)-
dc.titleNew results on estimating sortedness-
dc.typePG_Thesis-
dc.identifier.hkulb4716850-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b4716850-
dc.date.hkucongregation2012-
dc.identifier.mmsid991032833259703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats