File Download
 
Links for fulltext
(May Require Subscription)
 
Supplementary

Article: Cache-oblivious index for approximate string matching
  • Basic View
  • Metadata View
  • XML View
TitleCache-oblivious index for approximate string matching
 
AuthorsHon, WK3
Lam, TW2
Shah, R4
Tam, SL2
Vitter, JS1
 
KeywordsApproximate queries
Cache-oblivious
I/O model
Indexing
String matching
 
Issue Date2011
 
PublisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/tcs
 
CitationTheoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588 [How to Cite?]
DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004
 
AbstractThis paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved.
 
ISSN0304-3975
2013 Impact Factor: 0.516
2013 SCImago Journal Rankings: 0.932
 
DOIhttp://dx.doi.org/10.1016/j.tcs.2011.03.004
 
ISI Accession Number IDWOS:000292077200015
Funding AgencyGrant Number
Taiwan NSC96-2221-E-007-082
99-2221-E-007-123
Hong Kong RGC7140/06E
US NSFCCF-0621457
CCF-1017623
Funding Information:

A preliminary version appears in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 40-51, 2007. This research is supported in part by Taiwan NSC Grants 96-2221-E-007-082 and 99-2221-E-007-123 (Wing-Kai Hon), Hong Kong RGC Grant 7140/06E (Tak-Wah Lam), and US NSF Grants CCF-0621457 and CCF-1017623 (Rahul Shah and Jeffrey Scott Vitter).

 
ReferencesReferences in Scopus
 
DC FieldValue
dc.contributor.authorHon, WK
 
dc.contributor.authorLam, TW
 
dc.contributor.authorShah, R
 
dc.contributor.authorTam, SL
 
dc.contributor.authorVitter, JS
 
dc.date.accessioned2011-09-23T06:19:20Z
 
dc.date.available2011-09-23T06:19:20Z
 
dc.date.issued2011
 
dc.description.abstractThis paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved.
 
dc.description.naturepostprint
 
dc.identifier.citationTheoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588 [How to Cite?]
DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004
 
dc.identifier.citeulike9193876
 
dc.identifier.doihttp://dx.doi.org/10.1016/j.tcs.2011.03.004
 
dc.identifier.epage3588
 
dc.identifier.hkuros192200
 
dc.identifier.isiWOS:000292077200015
Funding AgencyGrant Number
Taiwan NSC96-2221-E-007-082
99-2221-E-007-123
Hong Kong RGC7140/06E
US NSFCCF-0621457
CCF-1017623
Funding Information:

A preliminary version appears in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 40-51, 2007. This research is supported in part by Taiwan NSC Grants 96-2221-E-007-082 and 99-2221-E-007-123 (Wing-Kai Hon), Hong Kong RGC Grant 7140/06E (Tak-Wah Lam), and US NSF Grants CCF-0621457 and CCF-1017623 (Rahul Shah and Jeffrey Scott Vitter).

 
dc.identifier.issn0304-3975
2013 Impact Factor: 0.516
2013 SCImago Journal Rankings: 0.932
 
dc.identifier.issue29
 
dc.identifier.openurl
 
dc.identifier.scopuseid_2-s2.0-79956348246
 
dc.identifier.spage3579
 
dc.identifier.urihttp://hdl.handle.net/10722/140789
 
dc.identifier.volume412
 
dc.languageeng
 
dc.publisherElsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/tcs
 
dc.publisher.placeNetherlands
 
dc.relation.ispartofTheoretical Computer Science
 
dc.relation.referencesReferences in Scopus
 
dc.rightsNOTICE: this is the author’s version of a work that was accepted for publication in Theoretical Computer Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Theoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588. DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004
 
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License
 
dc.subjectApproximate queries
 
dc.subjectCache-oblivious
 
dc.subjectI/O model
 
dc.subjectIndexing
 
dc.subjectString matching
 
dc.titleCache-oblivious index for approximate string matching
 
dc.typeArticle
 
<?xml encoding="utf-8" version="1.0"?>
<item><contributor.author>Hon, WK</contributor.author>
<contributor.author>Lam, TW</contributor.author>
<contributor.author>Shah, R</contributor.author>
<contributor.author>Tam, SL</contributor.author>
<contributor.author>Vitter, JS</contributor.author>
<date.accessioned>2011-09-23T06:19:20Z</date.accessioned>
<date.available>2011-09-23T06:19:20Z</date.available>
<date.issued>2011</date.issued>
<identifier.citation>Theoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588</identifier.citation>
<identifier.issn>0304-3975</identifier.issn>
<identifier.uri>http://hdl.handle.net/10722/140789</identifier.uri>
<description.abstract>This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require &#937; (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . &#169; 2011 Elsevier B.V. All rights reserved.</description.abstract>
<language>eng</language>
<publisher>Elsevier BV. The Journal&apos;s web site is located at http://www.elsevier.com/locate/tcs</publisher>
<relation.ispartof>Theoretical Computer Science</relation.ispartof>
<rights>NOTICE: this is the author&#8217;s version of a work that was accepted for publication in Theoretical Computer Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Theoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588. DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004</rights>
<rights>Creative Commons: Attribution 3.0 Hong Kong License</rights>
<subject>Approximate queries</subject>
<subject>Cache-oblivious</subject>
<subject>I/O model</subject>
<subject>Indexing</subject>
<subject>String matching</subject>
<title>Cache-oblivious index for approximate string matching</title>
<type>Article</type>
<identifier.openurl>http://library.hku.hk:4550/resserv?sid=HKU:IR&amp;issn=0304-3975&amp;volume=412&amp;issue=29&amp;spage=3579&amp;epage=3588&amp;date=2011&amp;atitle=Cache-oblivious+index+for+approximate+string+matching</identifier.openurl>
<description.nature>postprint</description.nature>
<identifier.doi>10.1016/j.tcs.2011.03.004</identifier.doi>
<identifier.scopus>eid_2-s2.0-79956348246</identifier.scopus>
<identifier.hkuros>192200</identifier.hkuros>
<relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-79956348246&amp;selection=ref&amp;src=s&amp;origin=recordpage</relation.references>
<identifier.volume>412</identifier.volume>
<identifier.issue>29</identifier.issue>
<identifier.spage>3579</identifier.spage>
<identifier.epage>3588</identifier.epage>
<identifier.isi>WOS:000292077200015</identifier.isi>
<publisher.place>Netherlands</publisher.place>
<identifier.citeulike>9193876</identifier.citeulike>
<bitstream.url>http://hub.hku.hk/bitstream/10722/140789/1/Content.pdf</bitstream.url>
</item>
Author Affiliations
  1. University of Kansas
  2. The University of Hong Kong
  3. National Tsing Hua University
  4. Louisiana State University