File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Compressed indexes for dynamic text collections

TitleCompressed indexes for dynamic text collections
Authors
KeywordsCompressed suffix tree
String matching
Issue Date2007
PublisherAssociation for Computing Machinery, Inc.
Citation
Acm Transactions On Algorithms, 2007, v. 3 n. 2 How to Cite?
Abstract
Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.
Persistent Identifierhttp://hdl.handle.net/10722/89072
ISSN
2013 Impact Factor: 0.400
2013 SCImago Journal Rankings: 1.519
References

 

Author Affiliations
  1. Kyushu University
  2. The University of Hong Kong
  3. National Tsing Hua University
DC FieldValueLanguage
dc.contributor.authorChan, HLen_HK
dc.contributor.authorHon, WKen_HK
dc.contributor.authorLam, TWen_HK
dc.contributor.authorSadakane, Ken_HK
dc.date.accessioned2010-09-06T09:52:02Z-
dc.date.available2010-09-06T09:52:02Z-
dc.date.issued2007en_HK
dc.identifier.citationAcm Transactions On Algorithms, 2007, v. 3 n. 2en_HK
dc.identifier.issn1549-6325en_HK
dc.identifier.urihttp://hdl.handle.net/10722/89072-
dc.description.abstractLet T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.en_HK
dc.languageengen_HK
dc.publisherAssociation for Computing Machinery, Inc.en_HK
dc.relation.ispartofACM Transactions on Algorithmsen_HK
dc.rightsACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc.en_HK
dc.subjectCompressed suffix treeen_HK
dc.subjectString matchingen_HK
dc.titleCompressed indexes for dynamic text collectionsen_HK
dc.typeArticleen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0730-0301&volume=3:2&spage=Article 21, pages 1&epage=29&date=2007&atitle=Compressed+Indexes+for+Dynamic+Text+Collectionsen_HK
dc.identifier.emailChan, HL:hlchan@cs.hku.hken_HK
dc.identifier.emailLam, TW:twlam@cs.hku.hken_HK
dc.identifier.authorityChan, HL=rp01310en_HK
dc.identifier.authorityLam, TW=rp00135en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/1240233.1240244en_HK
dc.identifier.scopuseid_2-s2.0-34250206975en_HK
dc.identifier.hkuros130770en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-34250206975&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume3en_HK
dc.identifier.issue2en_HK
dc.publisher.placeUnited Statesen_HK
dc.identifier.scopusauthoridChan, HL=7403402384en_HK
dc.identifier.scopusauthoridHon, WK=7004282818en_HK
dc.identifier.scopusauthoridLam, TW=7202523165en_HK
dc.identifier.scopusauthoridSadakane, K=7005716583en_HK
dc.identifier.citeulike1616929-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats