Article: Cache-oblivious index for approximate string matching
| Title | Cache-oblivious index for approximate string matching | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Authors | Hon, WK3 Lam, TW2 Shah, R4 Tam, SL2 Vitter, JS1 | ||||||||
| Keywords | Approximate queries Cache-oblivious I/O model Indexing String matching | ||||||||
| Issue Date | 2011 | ||||||||
| Publisher | Elsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/tcs | ||||||||
| Citation | Theoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588 [How to Cite?] DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004 | ||||||||
| Abstract | This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved. | ||||||||
| ISSN | 0304-3975 2011 Impact Factor: 0.665 2011 SCImago Journal Rankings: 0.055 | ||||||||
| DOI | http://dx.doi.org/10.1016/j.tcs.2011.03.004 | ||||||||
| ISI Accession Number ID | WOS:000292077200015
Funding Information: A preliminary version appears in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 40-51, 2007. This research is supported in part by Taiwan NSC Grants 96-2221-E-007-082 and 99-2221-E-007-123 (Wing-Kai Hon), Hong Kong RGC Grant 7140/06E (Tak-Wah Lam), and US NSF Grants CCF-0621457 and CCF-1017623 (Rahul Shah and Jeffrey Scott Vitter). | ||||||||
| References | References in Scopus |
| dc.contributor.author | Hon, WK | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| dc.contributor.author | Lam, TW | ||||||||
| dc.contributor.author | Shah, R | ||||||||
| dc.contributor.author | Tam, SL | ||||||||
| dc.contributor.author | Vitter, JS | ||||||||
| dc.date.accessioned | 2011-09-23T06:19:20Z | ||||||||
| dc.date.available | 2011-09-23T06:19:20Z | ||||||||
| dc.date.issued | 2011 | ||||||||
| dc.description.abstract | This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved. | ||||||||
| dc.description.nature | postprint | ||||||||
| dc.identifier.citation | Theoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588 [How to Cite?] DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004 | ||||||||
| dc.identifier.citeulike | 9193876 | ||||||||
| dc.identifier.doi | http://dx.doi.org/10.1016/j.tcs.2011.03.004 | ||||||||
| dc.identifier.epage | 3588 | ||||||||
| dc.identifier.hkuros | 192200 | ||||||||
| dc.identifier.isi | WOS:000292077200015
Funding Information: A preliminary version appears in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pages 40-51, 2007. This research is supported in part by Taiwan NSC Grants 96-2221-E-007-082 and 99-2221-E-007-123 (Wing-Kai Hon), Hong Kong RGC Grant 7140/06E (Tak-Wah Lam), and US NSF Grants CCF-0621457 and CCF-1017623 (Rahul Shah and Jeffrey Scott Vitter). | ||||||||
| dc.identifier.issn | 0304-3975 2011 Impact Factor: 0.665 2011 SCImago Journal Rankings: 0.055 | ||||||||
| dc.identifier.issue | 29 | ||||||||
| dc.identifier.openurl | ![]() | ||||||||
| dc.identifier.scopus | eid_2-s2.0-79956348246 | ||||||||
| dc.identifier.spage | 3579 | ||||||||
| dc.identifier.uri | http://hdl.handle.net/10722/140789 | ||||||||
| dc.identifier.volume | 412 | ||||||||
| dc.language | eng | ||||||||
| dc.publisher | Elsevier BV. The Journal's web site is located at http://www.elsevier.com/locate/tcs | ||||||||
| dc.publisher.place | Netherlands | ||||||||
| dc.relation.ispartof | Theoretical Computer Science | ||||||||
| dc.relation.references | References in Scopus | ||||||||
| dc.rights | NOTICE: this is the author’s version of a work that was accepted for publication in Theoretical Computer Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Theoretical Computer Science, 2011, v. 412 n. 29, p. 3579-3588. DOI: http://dx.doi.org/10.1016/j.tcs.2011.03.004 | ||||||||
| dc.rights | Creative Commons: Attribution 3.0 Hong Kong License | ||||||||
| dc.subject | Approximate queries | ||||||||
| dc.subject | Cache-oblivious | ||||||||
| dc.subject | I/O model | ||||||||
| dc.subject | Indexing | ||||||||
| dc.subject | String matching | ||||||||
| dc.title | Cache-oblivious index for approximate string matching | ||||||||
| dc.type | Article |
Author Affiliations
- University of Kansas
- The University of Hong Kong
- National Tsing Hua University
- Louisiana State University


