**Article:**Compressed indexes for dynamic text collections

Title | Compressed indexes for dynamic text collections |
---|---|

Authors | Chan, HL2 Hon, WK3 Lam, TW2 Sadakane, K1 |

Keywords | Compressed suffix tree String matching |

Issue Date | 2007 |

Publisher | Association for Computing Machinery, Inc. |

Citation | Acm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?] DOI: http://dx.doi.org/10.1145/1240233.1240244 |

Abstract | Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM. |

ISSN | 1549-6325 2012 Impact Factor: 0.54 2012 SCImago Journal Rankings: 1.664 |

DOI | http://dx.doi.org/10.1145/1240233.1240244 |

References | References in Scopus |

DC Field | Value |
---|---|

dc.contributor.author | Chan, HL |

dc.contributor.author | Hon, WK |

dc.contributor.author | Lam, TW |

dc.contributor.author | Sadakane, K |

dc.date.accessioned | 2010-09-06T09:52:02Z |

dc.date.available | 2010-09-06T09:52:02Z |

dc.date.issued | 2007 |

dc.description.abstract | Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM. |

dc.description.nature | Link_to_subscribed_fulltext |

dc.identifier.citation | Acm Transactions On Algorithms, 2007, v. 3 n. 2 [How to Cite?] DOI: http://dx.doi.org/10.1145/1240233.1240244 |

dc.identifier.citeulike | 1616929 |

dc.identifier.doi | http://dx.doi.org/10.1145/1240233.1240244 |

dc.identifier.hkuros | 130770 |

dc.identifier.issn | 1549-6325 2012 Impact Factor: 0.54 2012 SCImago Journal Rankings: 1.664 |

dc.identifier.issue | 2 |

dc.identifier.openurl | |

dc.identifier.scopus | eid_2-s2.0-34250206975 |

dc.identifier.uri | http://hdl.handle.net/10722/89072 |

dc.identifier.volume | 3 |

dc.language | eng |

dc.publisher | Association for Computing Machinery, Inc. |

dc.publisher.place | United States |

dc.relation.ispartof | ACM Transactions on Algorithms |

dc.relation.references | References in Scopus |

dc.rights | ACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc. |

dc.subject | Compressed suffix tree |

dc.subject | String matching |

dc.title | Compressed indexes for dynamic text collections |

dc.type | Article |

<?xml encoding="utf-8" version="1.0"?> <item><contributor.author>Chan, HL</contributor.author> <contributor.author>Hon, WK</contributor.author> <contributor.author>Lam, TW</contributor.author> <contributor.author>Sadakane, K</contributor.author> <date.accessioned>2010-09-06T09:52:02Z</date.accessioned> <date.available>2010-09-06T09:52:02Z</date.available> <date.issued>2007</date.issued> <identifier.citation>Acm Transactions On Algorithms, 2007, v. 3 n. 2</identifier.citation> <identifier.issn>1549-6325</identifier.issn> <identifier.uri>http://hdl.handle.net/10722/89072</identifier.uri> <description.abstract>Let T be a string with n characters over an alphabet of constant size. A recent breakthrough on compressed indexing allows us to build an index for T in optimal space (i.e., O(n) bits), while supporting very efficient pattern matching [Ferragina and Manzini 2000; Grossi and Vitter 2000]. Yet the compressed nature of such indexes also makes them difficult to update dynamically. This article extends the work on optimal-space indexing to a dynamic collection of texts. Our first result is a compressed solution to the library management problem, where we show an index of O(n) bits for a text collection L of total length n, which can be updated in O(|T | log n) time when a text T is inserted or deleted from L; also, the index supports searching the occurrences of any pattern P in all texts in L in O(|P| log n + occ log2 n) time, where occ is the number of occurrences. Our second result is a compressed solution to the dictionary matching problem, where we show an index of O(d) bits for a pattern collection D of total length d, which can be updated in O(|P| log2 d) time when a pattern P is inserted or deleted fromD; also, the index supports searching the occurrences of all patterns ofD in any text T in O((|T |+occ) log2 d) time. When compared with the O(d log d)-bit suffix-tree-based solution of Amir et al. [1995], the compact solution increases the query time by roughly a factor of log d only. The solution to the dictionary matching problem is based on a new compressed representation of a suffix tree. Precisely, we give an O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T | log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. In the study of the aforementioned result, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/ log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution. © 2007 ACM.</description.abstract> <language>eng</language> <publisher>Association for Computing Machinery, Inc.</publisher> <relation.ispartof>ACM Transactions on Algorithms</relation.ispartof> <rights>ACM Transactions on Algorithms. Copyright © Association for Computing Machinery, Inc.</rights> <subject>Compressed suffix tree</subject> <subject>String matching</subject> <title>Compressed indexes for dynamic text collections</title> <type>Article</type> <identifier.openurl>http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0730-0301&volume=3:2&spage=Article 21, pages 1&epage=29&date=2007&atitle=Compressed+Indexes+for+Dynamic+Text+Collections</identifier.openurl> <description.nature>Link_to_subscribed_fulltext</description.nature> <identifier.doi>10.1145/1240233.1240244</identifier.doi> <identifier.scopus>eid_2-s2.0-34250206975</identifier.scopus> <identifier.hkuros>130770</identifier.hkuros> <relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-34250206975&selection=ref&src=s&origin=recordpage</relation.references> <identifier.volume>3</identifier.volume> <identifier.issue>2</identifier.issue> <publisher.place>United States</publisher.place> <identifier.citeulike>1616929</identifier.citeulike> </item>

Author Affiliations

- Kyushu University
- The University of Hong Kong
- National Tsing Hua University