File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/1281192.1281235
- Scopus: eid_2-s2.0-36949013904
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Exploiting duality in summarization with deterministic guarantees
Title | Exploiting duality in summarization with deterministic guarantees |
---|---|
Authors | |
Keywords | Efficiency Histograms Synopses Wavelets |
Issue Date | 2007 |
Citation | Proceedings Of The Acm Sigkdd International Conference On Knowledge Discovery And Data Mining, 2007, p. 380-389 How to Cite? |
Abstract | Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over log*factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over log*+ logn in time and B(1-log B over log n) in space, where *is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation. © 2007 ACM. |
Persistent Identifier | http://hdl.handle.net/10722/151902 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Karras, P | en_US |
dc.contributor.author | Mamoulis, N | en_US |
dc.contributor.author | Sacharidis, D | en_US |
dc.date.accessioned | 2012-06-26T06:30:36Z | - |
dc.date.available | 2012-06-26T06:30:36Z | - |
dc.date.issued | 2007 | en_US |
dc.identifier.citation | Proceedings Of The Acm Sigkdd International Conference On Knowledge Discovery And Data Mining, 2007, p. 380-389 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/151902 | - |
dc.description.abstract | Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over log*factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over log*+ logn in time and B(1-log B over log n) in space, where *is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation. © 2007 ACM. | en_US |
dc.language | eng | en_US |
dc.relation.ispartof | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | en_US |
dc.subject | Efficiency | en_US |
dc.subject | Histograms | en_US |
dc.subject | Synopses | en_US |
dc.subject | Wavelets | en_US |
dc.title | Exploiting duality in summarization with deterministic guarantees | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Mamoulis, N:nikos@cs.hku.hk | en_US |
dc.identifier.authority | Mamoulis, N=rp00155 | en_US |
dc.description.nature | link_to_subscribed_fulltext | en_US |
dc.identifier.doi | 10.1145/1281192.1281235 | en_US |
dc.identifier.scopus | eid_2-s2.0-36949013904 | en_US |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-36949013904&selection=ref&src=s&origin=recordpage | en_US |
dc.identifier.spage | 380 | en_US |
dc.identifier.epage | 389 | en_US |
dc.identifier.scopusauthorid | Karras, P=14028488200 | en_US |
dc.identifier.scopusauthorid | Mamoulis, N=6701782749 | en_US |
dc.identifier.scopusauthorid | Sacharidis, D=10739131500 | en_US |