Conference Paper: Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences
| Title | Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences |
|---|---|
| Authors | Hon, WK1 Lam, TW1 Sung, WK2 Tse, WL1 Wong, CK1 Yiu, SM1 |
| Issue Date | 2004 |
| Citation | Proceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 [How to Cite?] |
| Abstract | Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes. |
| References | References in Scopus |
| dc.contributor.author | Hon, WK |
|---|---|
| dc.contributor.author | Lam, TW |
| dc.contributor.author | Sung, WK |
| dc.contributor.author | Tse, WL |
| dc.contributor.author | Wong, CK |
| dc.contributor.author | Yiu, SM |
| dc.date.accessioned | 2010-09-25T14:50:10Z |
| dc.date.available | 2010-09-25T14:50:10Z |
| dc.date.issued | 2004 |
| dc.description.abstract | Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes. |
| dc.description.nature | Link_to_subscribed_fulltext |
| dc.identifier.citation | Proceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 [How to Cite?] |
| dc.identifier.epage | 38 |
| dc.identifier.hkuros | 103185 |
| dc.identifier.scopus | eid_2-s2.0-8344235972 |
| dc.identifier.spage | 31 |
| dc.identifier.uri | http://hdl.handle.net/10722/93076 |
| dc.language | eng |
| dc.relation.ispartof | Proceedings of the Sixth Workshop on Algorithm Engineering and Experiments and the First Workshop on Analytic Algorithms and Combinatorics |
| dc.relation.references | References in Scopus |
| dc.title | Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences |
| dc.type | Conference_Paper |
Author Affiliations
- The University of Hong Kong
- National University of Singapore

