Conference Paper: Cache-oblivious index for approximate string matching
| Title | Cache-oblivious index for approximate string matching |
|---|---|
| Authors | Hon, WK2 Lam, TW1 Shah, R3 Tam, SL1 Vitter, JS3 |
| Issue Date | 2007 |
| Publisher | Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ |
| Citation | Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2007, v. 4580 LNCS, p. 40-51 [How to Cite?] |
| Abstract | This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((n logk n)/B) disk pages and finds all k-error matches with O((|P| + occ)/B + logk n log logB n) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P| + occ + poly(log n)) I/Os. The second index reduces the space to O{{n log n)/B) disk pages, and the I/O complexity is O((|P| + occ)/B + log k(k+1) n log log n). © Springer-Verlag Berlin Heidelberg 2007. |
| ISSN | 0302-9743 2011 SCImago Journal Rankings: 0.034 |
| References | References in Scopus |
| dc.contributor.author | Hon, WK |
|---|---|
| dc.contributor.author | Lam, TW |
| dc.contributor.author | Shah, R |
| dc.contributor.author | Tam, SL |
| dc.contributor.author | Vitter, JS |
| dc.date.accessioned | 2010-09-25T14:52:30Z |
| dc.date.available | 2010-09-25T14:52:30Z |
| dc.date.issued | 2007 |
| dc.description.abstract | This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((n logk n)/B) disk pages and finds all k-error matches with O((|P| + occ)/B + logk n log logB n) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P| + occ + poly(log n)) I/Os. The second index reduces the space to O{{n log n)/B) disk pages, and the I/O complexity is O((|P| + occ)/B + log k(k+1) n log log n). © Springer-Verlag Berlin Heidelberg 2007. |
| dc.description.nature | Link_to_subscribed_fulltext |
| dc.identifier.citation | Lecture Notes In Computer Science (Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics), 2007, v. 4580 LNCS, p. 40-51 [How to Cite?] |
| dc.identifier.epage | 51 |
| dc.identifier.hkuros | 128193 |
| dc.identifier.issn | 0302-9743 2011 SCImago Journal Rankings: 0.034 |
| dc.identifier.scopus | eid_2-s2.0-37849007688 |
| dc.identifier.spage | 40 |
| dc.identifier.uri | http://hdl.handle.net/10722/93153 |
| dc.identifier.volume | 4580 LNCS |
| dc.language | eng |
| dc.publisher | Springer Verlag. The Journal's web site is located at http://springerlink.com/content/105633/ |
| dc.publisher.place | Germany |
| dc.relation.ispartof | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
| dc.relation.references | References in Scopus |
| dc.title | Cache-oblivious index for approximate string matching |
| dc.type | Conference_Paper |
Author Affiliations
- The University of Hong Kong
- National Tsing Hua University
- Purdue University

