Article: Compressed indexes for dynamic text collections
| Title | Compressed indexes for dynamic text collections |
|---|---|
| Authors | Chan, HL2 Hon, WK3 Lam, TW2 Sadakane, K1 |
| Keywords | Compressed suffix tree String matching |
| Issue Date | 2007 |
| Publisher | Association for Computing Machinery, Inc. |
| Citation | Acm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?] DOI: http://dx.doi.org/10.1145/1240233.1240244 |
| Abstract | Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM. |
| ISSN | 1549-6325 2011 Impact Factor: 0.646 2011 SCImago Journal Rankings: 0.051 |
| DOI | http://dx.doi.org/10.1145/1240233.1240244 |
| References | References in Scopus |
| dc.contributor.author | Chan, HL |
|---|---|
| dc.contributor.author | Hon, WK |
| dc.contributor.author | Lam, TW |
| dc.contributor.author | Sadakane, K |
| dc.date.accessioned | 2010-09-06T09:52:02Z |
| dc.date.available | 2010-09-06T09:52:02Z |
| dc.date.issued | 2007 |
| dc.description.abstract | Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM. |
| dc.description.nature | Link_to_subscribed_fulltext |
| dc.identifier.citation | Acm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?] DOI: http://dx.doi.org/10.1145/1240233.1240244 |
| dc.identifier.citeulike | 1616929 |
| dc.identifier.doi | http://dx.doi.org/10.1145/1240233.1240244 |
| dc.identifier.hkuros | 130770 |
| dc.identifier.issn | 1549-6325 2011 Impact Factor: 0.646 2011 SCImago Journal Rankings: 0.051 |
| dc.identifier.issue | 2 |
| dc.identifier.openurl | ![]() |
| dc.identifier.scopus | eid_2-s2.0-34250206975 |
| dc.identifier.uri | http://hdl.handle.net/10722/89072 |
| dc.identifier.volume | 3 |
| dc.language | eng |
| dc.publisher | Association for Computing Machinery, Inc. |
| dc.publisher.place | United States |
| dc.relation.ispartof | ACM Transactions on Algorithms |
| dc.relation.references | References in Scopus |
| dc.rights | ACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc. |
| dc.subject | Compressed suffix tree |
| dc.subject | String matching |
| dc.title | Compressed indexes for dynamic text collections |
| dc.type | Article |
Author Affiliations
- Kyushu University
- The University of Hong Kong
- National Tsing Hua University


