Durable top-k search in document archives

Hou U, L; Mamoulis, N; Berberich, K; Bedathur, S

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/1807167.1807228
Scopus: eid_2-s2.0-77954751022
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Durable top-k search in document archives

Title	Durable top-k search in document archives
Authors	Hou U, L Mamoulis, N Berberich, K Bedathur, S
Keywords	document archives temporal queries top-k search
Issue Date	2010
Publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://www.acm.org/sigmod
Citation	The 2010 International Conference on Management of Data (SIGMOD '10), Indianapolis, IN., 6-11 June 2010. In Proceedings of the ACM Conference on Management of Data, 2010, p. 555-566 How to Cite? DOI: http://dx.doi.org/10.1145/1807167.1807228
Abstract	We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions. © 2010 ACM.
Persistent Identifier	http://hdl.handle.net/10722/129564
ISBN	978-1-4503-0032-2
ISSN	0730-8078 2023 SCImago Journal Rankings: 2.640
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Hou U, L	en_HK
dc.contributor.author	Mamoulis, N	en_HK
dc.contributor.author	Berberich, K	en_HK
dc.contributor.author	Bedathur, S	en_HK
dc.date.accessioned	2010-12-23T08:39:19Z	-
dc.date.available	2010-12-23T08:39:19Z	-
dc.date.issued	2010	en_HK
dc.identifier.citation	The 2010 International Conference on Management of Data (SIGMOD '10), Indianapolis, IN., 6-11 June 2010. In Proceedings of the ACM Conference on Management of Data, 2010, p. 555-566	en_HK
dc.identifier.isbn	978-1-4503-0032-2	-
dc.identifier.issn	0730-8078	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/129564	-
dc.description.abstract	We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions. © 2010 ACM.	en_HK
dc.language	eng	en_US
dc.publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://www.acm.org/sigmod	en_HK
dc.relation.ispartof	Proceedings of the ACM SIGMOD International Conference on Management of Data	en_HK
dc.rights	Proceedings of the ACM Conference on Management of Data. Copyright © Association for Computing Machinery.	-
dc.subject	document archives	en_HK
dc.subject	temporal queries	en_HK
dc.subject	top-k search	en_HK
dc.title	Durable top-k search in document archives	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Mamoulis, N:nikos@cs.hku.hk	en_HK
dc.identifier.authority	Mamoulis, N=rp00155	en_HK
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1145/1807167.1807228	en_HK
dc.identifier.scopus	eid_2-s2.0-77954751022	en_HK
dc.identifier.hkuros	176423	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-77954751022&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.spage	555	en_HK
dc.identifier.epage	566	en_HK
dc.publisher.place	United States	en_HK
dc.description.other	The 2010 International Conference on Management of Data (SIGMOD '10), Indianapolis, IN., 6-11 June 2010. In Proceedings of the ACM Conference on Management of Data, 2010, p. 555-566	-
dc.identifier.scopusauthorid	Hou U, L=13605267100	en_HK
dc.identifier.scopusauthorid	Mamoulis, N=6701782749	en_HK
dc.identifier.scopusauthorid	Berberich, K=15130456300	en_HK
dc.identifier.scopusauthorid	Bedathur, S=22833788900	en_HK
dc.identifier.issnl	0730-8078	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Durable top-k search in document archives

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats