File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Exploiting duality in summarization with deterministic guarantees

TitleExploiting duality in summarization with deterministic guarantees
Authors
KeywordsEfficiency
Histograms
Synopses
Wavelets
Issue Date2007
Citation
Proceedings Of The Acm Sigkdd International Conference On Knowledge Discovery And Data Mining, 2007, p. 380-389 How to Cite?
AbstractSummarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over log*factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over log*+ logn in time and B(1-log B over log n) in space, where *is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation. © 2007 ACM.
Persistent Identifierhttp://hdl.handle.net/10722/151902
References

 

DC FieldValueLanguage
dc.contributor.authorKarras, Pen_US
dc.contributor.authorMamoulis, Nen_US
dc.contributor.authorSacharidis, Den_US
dc.date.accessioned2012-06-26T06:30:36Z-
dc.date.available2012-06-26T06:30:36Z-
dc.date.issued2007en_US
dc.identifier.citationProceedings Of The Acm Sigkdd International Conference On Knowledge Discovery And Data Mining, 2007, p. 380-389en_US
dc.identifier.urihttp://hdl.handle.net/10722/151902-
dc.description.abstractSummarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over log*factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over log*+ logn in time and B(1-log B over log n) in space, where *is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation. © 2007 ACM.en_US
dc.languageengen_US
dc.relation.ispartofProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Miningen_US
dc.subjectEfficiencyen_US
dc.subjectHistogramsen_US
dc.subjectSynopsesen_US
dc.subjectWaveletsen_US
dc.titleExploiting duality in summarization with deterministic guaranteesen_US
dc.typeConference_Paperen_US
dc.identifier.emailMamoulis, N:nikos@cs.hku.hken_US
dc.identifier.authorityMamoulis, N=rp00155en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.doi10.1145/1281192.1281235en_US
dc.identifier.scopuseid_2-s2.0-36949013904en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-36949013904&selection=ref&src=s&origin=recordpageen_US
dc.identifier.spage380en_US
dc.identifier.epage389en_US
dc.identifier.scopusauthoridKarras, P=14028488200en_US
dc.identifier.scopusauthoridMamoulis, N=6701782749en_US
dc.identifier.scopusauthoridSacharidis, D=10739131500en_US

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats