Article: Compressed indexes for dynamic text collections

File Download Links for fulltext
(May Require Subscription)
Supplementary
  • Basic View
  • Metadata View
  • XML View
TitleCompressed indexes for dynamic text collections
AuthorsChan, HL2
Hon, WK3
Lam, TW2
Sadakane, K1
KeywordsCompressed suffix tree
String matching
Issue Date2007
PublisherAssociation for Computing Machinery, Inc.
CitationAcm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1145/1240233.1240244
AbstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
ISSN1549-6325
2011 Impact Factor: 0.646
2011 SCImago Journal Rankings: 0.051
DOIhttp://dx.doi.org/10.1145/1240233.1240244
ReferencesReferences in Scopus
DC Field
Value
dc.contributor.authorChan, HL
dc.contributor.authorHon, WK
dc.contributor.authorLam, TW
dc.contributor.authorSadakane, K
dc.date.accessioned2010-09-06T09:52:02Z
dc.date.available2010-09-06T09:52:02Z
dc.date.issued2007
dc.description.abstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
dc.description.natureLink_to_subscribed_fulltext
dc.identifier.citationAcm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1145/1240233.1240244
dc.identifier.citeulike1616929
dc.identifier.doihttp://dx.doi.org/10.1145/1240233.1240244
dc.identifier.hkuros130770
dc.identifier.issn1549-6325
2011 Impact Factor: 0.646
2011 SCImago Journal Rankings: 0.051
dc.identifier.issue2
dc.identifier.openurl
dc.identifier.scopuseid_2-s2.0-34250206975
dc.identifier.urihttp://hdl.handle.net/10722/89072
dc.identifier.volume3
dc.languageeng
dc.publisherAssociation for Computing Machinery, Inc.
dc.publisher.placeUnited States
dc.relation.ispartofACM Transactions on Algorithms
dc.relation.referencesReferences in Scopus
dc.rightsACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc.
dc.subjectCompressed suffix tree
dc.subjectString matching
dc.titleCompressed indexes for dynamic text collections
dc.typeArticle
Author Affiliations
  1. Kyushu University
  2. The University of Hong Kong
  3. National Tsing Hua University