Article: Approximate string matching using compressed suffix arrays

File Download Links for fulltext
(May Require Subscription)
Supplementary

  • Basic View
  • Metadata View
  • XML View
TitleApproximate string matching using compressed suffix arrays
AuthorsHuynh, TND2
Hon, WK1
Lam, TW1
Sung, WK2
Issue Date2006
PublisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/tcs
CitationTheoretical Computer Science, 2006, v. 352 n. 1-3, p. 240-249 [How to Cite?]
DOI: http://dx.doi.org/10.1016/j.tcs.2005.11.022
AbstractLet T be a text of length n and P be a pattern of length m, both strings over a fixed finite alphabet A. The k-difference (k-mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P. In this paper we investigate a well-studied case in which T is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster. We give a solution using an O(nlogn) bits indexing data structure with O(|A|kmk·max(k,logn) +occ) query time, where occ is the number of occurrences. The best previous result requires O(nlogn) bits indexing data structure and gives O(|A|kmk+2+occ) query time. Our solution also allows us to exploit compressed suffix arrays to reduce the indexing space to O(n) bits, while increasing the query time by an O(logn) factor only. © 2005 Elsevier B.V. All right reserved.
ISSN0304-3975
2011 Impact Factor: 0.665
2011 SCImago Journal Rankings: 0.055
DOIhttp://dx.doi.org/10.1016/j.tcs.2005.11.022
ReferencesReferences in Scopus
DC Field
Value
dc.contributor.authorHuynh, TND
dc.contributor.authorHon, WK
dc.contributor.authorLam, TW
dc.contributor.authorSung, WK
dc.date.accessioned2012-06-26T06:37:14Z
dc.date.available2012-06-26T06:37:14Z
dc.date.issued2006
dc.description.abstractLet T be a text of length n and P be a pattern of length m, both strings over a fixed finite alphabet A. The k-difference (k-mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P. In this paper we investigate a well-studied case in which T is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster. We give a solution using an O(nlogn) bits indexing data structure with O(|A|kmk·max(k,logn) +occ) query time, where occ is the number of occurrences. The best previous result requires O(nlogn) bits indexing data structure and gives O(|A|kmk+2+occ) query time. Our solution also allows us to exploit compressed suffix arrays to reduce the indexing space to O(n) bits, while increasing the query time by an O(logn) factor only. © 2005 Elsevier B.V. All right reserved.
dc.description.natureLink_to_subscribed_fulltext
dc.identifier.citationTheoretical Computer Science, 2006, v. 352 n. 1-3, p. 240-249 [How to Cite?]
DOI: http://dx.doi.org/10.1016/j.tcs.2005.11.022
dc.identifier.doihttp://dx.doi.org/10.1016/j.tcs.2005.11.022
dc.identifier.epage249
dc.identifier.isiWOS:000235826900018
dc.identifier.issn0304-3975
2011 Impact Factor: 0.665
2011 SCImago Journal Rankings: 0.055
dc.identifier.issue1-3
dc.identifier.scopuseid_2-s2.0-32644436921
dc.identifier.spage240
dc.identifier.urihttp://hdl.handle.net/10722/152330
dc.identifier.volume352
dc.languageeng
dc.publisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/tcs
dc.publisher.placeNetherlands
dc.relation.ispartofTheoretical Computer Science
dc.relation.referencesReferences in Scopus
dc.titleApproximate string matching using compressed suffix arrays
dc.typeArticle
Author Affiliations
  1. The University of Hong Kong
  2. National University of Singapore