File Download
 
Links for fulltext
(May Require Subscription)
 
Supplementary

Article: Compressed indexes for dynamic text collections
  • Basic View
  • Metadata View
  • XML View
TitleCompressed indexes for dynamic text collections
 
AuthorsChan, HL2 2
Hon, WK3 3
Lam, TW2 2
Sadakane, K1 1
 
KeywordsCompressed suffix tree
String matching
 
Issue Date2007
 
PublisherAssociation for Computing Machinery, Inc.
 
CitationAcm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1145/1240233.1240244
 
AbstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
 
ISSN1549-6325
2012 Impact Factor: 0.54
2012 SCImago Journal Rankings: 1.664
 
DOIhttp://dx.doi.org/10.1145/1240233.1240244
 
ReferencesReferences in Scopus
 
DC FieldValue
dc.contributor.authorChan, HL
 
dc.contributor.authorHon, WK
 
dc.contributor.authorLam, TW
 
dc.contributor.authorSadakane, K
 
dc.date.accessioned2010-09-06T09:52:02Z
 
dc.date.available2010-09-06T09:52:02Z
 
dc.date.issued2007
 
dc.description.abstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
 
dc.description.natureLink_to_subscribed_fulltext
 
dc.identifier.citationAcm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1145/1240233.1240244
 
dc.identifier.citeulike1616929
 
dc.identifier.doihttp://dx.doi.org/10.1145/1240233.1240244
 
dc.identifier.hkuros130770
 
dc.identifier.issn1549-6325
2012 Impact Factor: 0.54
2012 SCImago Journal Rankings: 1.664
 
dc.identifier.issue2
 
dc.identifier.openurl
 
dc.identifier.scopuseid_2-s2.0-34250206975
 
dc.identifier.urihttp://hdl.handle.net/10722/89072
 
dc.identifier.volume3
 
dc.languageeng
 
dc.publisherAssociation for Computing Machinery, Inc.
 
dc.publisher.placeUnited States
 
dc.relation.ispartofACM Transactions on Algorithms
 
dc.relation.referencesReferences in Scopus
 
dc.rightsACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc.
 
dc.subjectCompressed suffix tree
 
dc.subjectString matching
 
dc.titleCompressed indexes for dynamic text collections
 
dc.typeArticle
 
<?xml encoding="utf-8" version="1.0"?>
<item><contributor.author>Chan, HL</contributor.author>
<contributor.author>Hon, WK</contributor.author>
<contributor.author>Lam, TW</contributor.author>
<contributor.author>Sadakane, K</contributor.author>
<date.accessioned>2010-09-06T09:52:02Z</date.accessioned>
<date.available>2010-09-06T09:52:02Z</date.available>
<date.issued>2007</date.issued>
<identifier.citation>Acm Transactions On Algorithms, 2007, v. 3 n. 2</identifier.citation>
<identifier.issn>1549-6325</identifier.issn>
<identifier.uri>http://hdl.handle.net/10722/89072</identifier.uri>
<description.abstract>Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. &#169; 2007 ACM.</description.abstract>
<language>eng</language>
<publisher>Association for Computing Machinery, Inc.</publisher>
<relation.ispartof>ACM Transactions on Algorithms</relation.ispartof>
<rights>ACM Transactions on Algorithms. Copyright &#169; Association for Computing Machinery, Inc.</rights>
<subject>Compressed suffix tree</subject>
<subject>String matching</subject>
<title>Compressed indexes for dynamic text collections</title>
<type>Article</type>
<identifier.openurl>http://library.hku.hk:4550/resserv?sid=HKU:IR&amp;issn=0730-0301&amp;volume=3:2&amp;spage=Article 21, pages 1&amp;epage=29&amp;date=2007&amp;atitle=Compressed+Indexes+for+Dynamic+Text+Collections</identifier.openurl>
<description.nature>Link_to_subscribed_fulltext</description.nature>
<identifier.doi>10.1145/1240233.1240244</identifier.doi>
<identifier.scopus>eid_2-s2.0-34250206975</identifier.scopus>
<identifier.hkuros>130770</identifier.hkuros>
<relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-34250206975&amp;selection=ref&amp;src=s&amp;origin=recordpage</relation.references>
<identifier.volume>3</identifier.volume>
<identifier.issue>2</identifier.issue>
<publisher.place>United States</publisher.place>
<identifier.citeulike>1616929</identifier.citeulike>
</item>
Author Affiliations
  1. Kyushu University
  2. The University of Hong Kong
  3. National Tsing Hua University