File Download
 
Links for fulltext
(May Require Subscription)
 
Supplementary

Article: Compressed indexes for dynamic text collections
  • Basic View
  • Metadata View
  • XML View
TitleCompressed indexes for dynamic text collections
 
AuthorsChan, HL2
Hon, WK3
Lam, TW2
Sadakane, K1
 
KeywordsCompressed suffix tree
String matching
 
Issue Date2007
 
PublisherAssociation for Computing Machinery, Inc.
 
CitationAcm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1145/1240233.1240244
 
AbstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
 
ISSN1549-6325
2013 Impact Factor: 0.400
2013 SCImago Journal Rankings: 1.519
 
DOIhttp://dx.doi.org/10.1145/1240233.1240244
 
ReferencesReferences in Scopus
 
DC FieldValue
dc.contributor.authorChan, HL
 
dc.contributor.authorHon, WK
 
dc.contributor.authorLam, TW
 
dc.contributor.authorSadakane, K
 
dc.date.accessioned2010-09-06T09:52:02Z
 
dc.date.available2010-09-06T09:52:02Z
 
dc.date.issued2007
 
dc.description.abstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
 
dc.description.natureLink_to_subscribed_fulltext
 
dc.identifier.citationAcm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1145/1240233.1240244
 
dc.identifier.citeulike1616929
 
dc.identifier.doihttp://dx.doi.org/10.1145/1240233.1240244
 
dc.identifier.hkuros130770
 
dc.identifier.issn1549-6325
2013 Impact Factor: 0.400
2013 SCImago Journal Rankings: 1.519
 
dc.identifier.issue2
 
dc.identifier.openurl
 
dc.identifier.scopuseid_2-s2.0-34250206975
 
dc.identifier.urihttp://hdl.handle.net/10722/89072
 
dc.identifier.volume3
 
dc.languageeng
 
dc.publisherAssociation for Computing Machinery, Inc.
 
dc.publisher.placeUnited States
 
dc.relation.ispartofACM Transactions on Algorithms
 
dc.relation.referencesReferences in Scopus
 
dc.rightsACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc.
 
dc.subjectCompressed suffix tree
 
dc.subjectString matching
 
dc.titleCompressed indexes for dynamic text collections
 
dc.typeArticle
 
<?xml encoding="utf-8" version="1.0"?>
<item><contributor.author>Chan, HL</contributor.author>
<contributor.author>Hon, WK</contributor.author>
<contributor.author>Lam, TW</contributor.author>
<contributor.author>Sadakane, K</contributor.author>
<date.accessioned>2010-09-06T09:52:02Z</date.accessioned>
<date.available>2010-09-06T09:52:02Z</date.available>
<date.issued>2007</date.issued>
<identifier.citation>Acm Transactions On Algorithms, 2007, v. 3 n. 2</identifier.citation>
<identifier.issn>1549-6325</identifier.issn>
<identifier.uri>http://hdl.handle.net/10722/89072</identifier.uri>
<description.abstract>Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. &#169; 2007 ACM.</description.abstract>
<language>eng</language>
<publisher>Association for Computing Machinery, Inc.</publisher>
<relation.ispartof>ACM Transactions on Algorithms</relation.ispartof>
<rights>ACM Transactions on Algorithms. Copyright &#169; Association for Computing Machinery, Inc.</rights>
<subject>Compressed suffix tree</subject>
<subject>String matching</subject>
<title>Compressed indexes for dynamic text collections</title>
<type>Article</type>
<identifier.openurl>http://library.hku.hk:4550/resserv?sid=HKU:IR&amp;issn=0730-0301&amp;volume=3:2&amp;spage=Article 21, pages 1&amp;epage=29&amp;date=2007&amp;atitle=Compressed+Indexes+for+Dynamic+Text+Collections</identifier.openurl>
<description.nature>Link_to_subscribed_fulltext</description.nature>
<identifier.doi>10.1145/1240233.1240244</identifier.doi>
<identifier.scopus>eid_2-s2.0-34250206975</identifier.scopus>
<identifier.hkuros>130770</identifier.hkuros>
<relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-34250206975&amp;selection=ref&amp;src=s&amp;origin=recordpage</relation.references>
<identifier.volume>3</identifier.volume>
<identifier.issue>2</identifier.issue>
<publisher.place>United States</publisher.place>
<identifier.citeulike>1616929</identifier.citeulike>
</item>
Author Affiliations
  1. Kyushu University
  2. The University of Hong Kong
  3. National Tsing Hua University