File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/1240233.1240244
- Scopus: eid_2-s2.0-34250206975
- WOS: WOS:000494447000011
- Find via
Supplementary
-
Bookmarks:
- CiteULike: 2
- Citations:
- Appears in Collections:
Article: Compressed indexes for dynamic text collections
Title | Compressed indexes for dynamic text collections |
---|---|
Authors | |
Keywords | Compressed suffix tree String matching |
Issue Date | 2007 |
Publisher | Association for Computing Machinery, Inc. |
Citation | ACM Transactions On Algorithms, 2007, v. 3 n. 2, article no. 21 How to Cite? |
Abstract | Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM. |
Persistent Identifier | http://hdl.handle.net/10722/89072 |
ISSN | 2023 Impact Factor: 0.9 2023 SCImago Journal Rankings: 1.555 |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chan, HL | en_HK |
dc.contributor.author | Hon, WK | en_HK |
dc.contributor.author | Lam, TW | en_HK |
dc.contributor.author | Sadakane, K | en_HK |
dc.date.accessioned | 2010-09-06T09:52:02Z | - |
dc.date.available | 2010-09-06T09:52:02Z | - |
dc.date.issued | 2007 | en_HK |
dc.identifier.citation | ACM Transactions On Algorithms, 2007, v. 3 n. 2, article no. 21 | en_HK |
dc.identifier.issn | 1549-6325 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/89072 | - |
dc.description.abstract | Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM. | en_HK |
dc.language | eng | en_HK |
dc.publisher | Association for Computing Machinery, Inc. | en_HK |
dc.relation.ispartof | ACM Transactions on Algorithms | en_HK |
dc.rights | ACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc. | en_HK |
dc.subject | Compressed suffix tree | en_HK |
dc.subject | String matching | en_HK |
dc.title | Compressed indexes for dynamic text collections | en_HK |
dc.type | Article | en_HK |
dc.identifier.openurl | http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0730-0301&volume=3:2&spage=Article 21, pages 1&epage=29&date=2007&atitle=Compressed+Indexes+for+Dynamic+Text+Collections | en_HK |
dc.identifier.email | Chan, HL:hlchan@cs.hku.hk | en_HK |
dc.identifier.email | Lam, TW:twlam@cs.hku.hk | en_HK |
dc.identifier.authority | Chan, HL=rp01310 | en_HK |
dc.identifier.authority | Lam, TW=rp00135 | en_HK |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/1240233.1240244 | en_HK |
dc.identifier.scopus | eid_2-s2.0-34250206975 | en_HK |
dc.identifier.hkuros | 130770 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-34250206975&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 3 | en_HK |
dc.identifier.issue | 2 | en_HK |
dc.identifier.spage | article no. 21 | - |
dc.identifier.epage | article no. 21 | - |
dc.identifier.isi | WOS:000494447000011 | - |
dc.publisher.place | United States | en_HK |
dc.identifier.scopusauthorid | Chan, HL=7403402384 | en_HK |
dc.identifier.scopusauthorid | Hon, WK=7004282818 | en_HK |
dc.identifier.scopusauthorid | Lam, TW=7202523165 | en_HK |
dc.identifier.scopusauthorid | Sadakane, K=7005716583 | en_HK |
dc.identifier.citeulike | 1616929 | - |
dc.identifier.issnl | 1549-6325 | - |