**Conference Paper:**Improved approximate string matching using compressed suffix data structures

Title | Improved approximate string matching using compressed suffix data structures |
---|---|

Authors | Lam, TW1 Sung, WK2 Wong, SS2 |

Issue Date | 2008 |

Publisher | Springer New York LLC. The Journal's web site is located at http://link.springer.de/link/service/journals/00453/index.htm |

Citation | Algorithmica (New York), 2008, v. 51 n. 3, p. 298-314 [How to Cite?] DOI: http://dx.doi.org/10.1007/s00453-007-9104-8 |

Abstract | Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(n √log n log |A|)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|m log log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2 n) bits. The space of our data structure can be further reduced to O(n log |A|) with the query time increasing by a factor of log ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k mk (k + log log n) + occ) and O(logε n(|A|k mk (k + log log n) + occ)) time using an O(n √log n log |A|)-bit and an O(n log |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2 √log n) for the O(n√log n log |A|)-bit space data structure. © 2007 Springer Science+Business Media, LLC. |

ISSN | 0178-4617 2012 Impact Factor: 0.488 2012 SCImago Journal Rankings: 1.214 |

DOI | http://dx.doi.org/10.1007/s00453-007-9104-8 |

ISI Accession Number ID | WOS:000255874200005 |

References | References in Scopus |

DC Field | Value |
---|---|

dc.contributor.author | Lam, TW |

dc.contributor.author | Sung, WK |

dc.contributor.author | Wong, SS |

dc.date.accessioned | 2012-06-26T06:30:42Z |

dc.date.available | 2012-06-26T06:30:42Z |

dc.date.issued | 2008 |

dc.description.abstract | Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(n √log n log |A|)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|m log log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2 n) bits. The space of our data structure can be further reduced to O(n log |A|) with the query time increasing by a factor of log ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k mk (k + log log n) + occ) and O(logε n(|A|k mk (k + log log n) + occ)) time using an O(n √log n log |A|)-bit and an O(n log |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2 √log n) for the O(n√log n log |A|)-bit space data structure. © 2007 Springer Science+Business Media, LLC. |

dc.description.nature | Link_to_subscribed_fulltext |

dc.identifier.citation | Algorithmica (New York), 2008, v. 51 n. 3, p. 298-314 [How to Cite?] DOI: http://dx.doi.org/10.1007/s00453-007-9104-8 |

dc.identifier.doi | http://dx.doi.org/10.1007/s00453-007-9104-8 |

dc.identifier.epage | 314 |

dc.identifier.isi | WOS:000255874200005 |

dc.identifier.issn | 0178-4617 2012 Impact Factor: 0.488 2012 SCImago Journal Rankings: 1.214 |

dc.identifier.issue | 3 |

dc.identifier.scopus | eid_2-s2.0-43949112336 |

dc.identifier.spage | 298 |

dc.identifier.uri | http://hdl.handle.net/10722/151910 |

dc.identifier.volume | 51 |

dc.language | eng |

dc.publisher | Springer New York LLC. The Journal's web site is located at http://link.springer.de/link/service/journals/00453/index.htm |

dc.publisher.place | United States |

dc.relation.ispartof | Algorithmica (New York) |

dc.relation.references | References in Scopus |

dc.title | Improved approximate string matching using compressed suffix data structures |

dc.type | Conference_Paper |

<?xml encoding="utf-8" version="1.0"?> <item><contributor.author>Lam, TW</contributor.author> <contributor.author>Sung, WK</contributor.author> <contributor.author>Wong, SS</contributor.author> <date.accessioned>2012-06-26T06:30:42Z</date.accessioned> <date.available>2012-06-26T06:30:42Z</date.available> <date.issued>2008</date.issued> <identifier.citation>Algorithmica (New York), 2008, v. 51 n. 3, p. 298-314</identifier.citation> <identifier.issn>0178-4617</identifier.issn> <identifier.uri>http://hdl.handle.net/10722/151910</identifier.uri> <description.abstract>Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an O(n √log n log |A|)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|m log log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2 n) bits. The space of our data structure can be further reduced to O(n log |A|) with the query time increasing by a factor of log ε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k mk (k + log log n) + occ) and O(logε n(|A|k mk (k + log log n) + occ)) time using an O(n √log n log |A|)-bit and an O(n log |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by O(2 √log n) for the O(n√log n log |A|)-bit space data structure. © 2007 Springer Science+Business Media, LLC.</description.abstract> <language>eng</language> <publisher>Springer New York LLC. The Journal's web site is located at http://link.springer.de/link/service/journals/00453/index.htm</publisher> <relation.ispartof>Algorithmica (New York)</relation.ispartof> <title>Improved approximate string matching using compressed suffix data structures</title> <type>Conference_Paper</type> <description.nature>Link_to_subscribed_fulltext</description.nature> <identifier.doi>10.1007/s00453-007-9104-8</identifier.doi> <identifier.scopus>eid_2-s2.0-43949112336</identifier.scopus> <relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-43949112336&selection=ref&src=s&origin=recordpage</relation.references> <identifier.volume>51</identifier.volume> <identifier.issue>3</identifier.issue> <identifier.spage>298</identifier.spage> <identifier.epage>314</identifier.epage> <identifier.isi>WOS:000255874200005</identifier.isi> <publisher.place>United States</publisher.place> </item>

Author Affiliations

- The University of Hong Kong
- National University of Singapore