File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences
Title | Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences |
---|---|
Authors | |
Issue Date | 2004 |
Citation | Proceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 How to Cite? |
Abstract | Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes. |
Persistent Identifier | http://hdl.handle.net/10722/93076 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Hon, WK | en_HK |
dc.contributor.author | Lam, TW | en_HK |
dc.contributor.author | Sung, WK | en_HK |
dc.contributor.author | Tse, WL | en_HK |
dc.contributor.author | Wong, CK | en_HK |
dc.contributor.author | Yiu, SM | en_HK |
dc.date.accessioned | 2010-09-25T14:50:10Z | - |
dc.date.available | 2010-09-25T14:50:10Z | - |
dc.date.issued | 2004 | en_HK |
dc.identifier.citation | Proceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/93076 | - |
dc.description.abstract | Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes. | en_HK |
dc.language | eng | en_HK |
dc.relation.ispartof | Proceedings of the Sixth Workshop on Algorithm Engineering and Experiments and the First Workshop on Analytic Algorithms and Combinatorics | en_HK |
dc.title | Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences | en_HK |
dc.type | Conference_Paper | en_HK |
dc.identifier.email | Lam, TW:twlam@cs.hku.hk | en_HK |
dc.identifier.email | Yiu, SM:smyiu@cs.hku.hk | en_HK |
dc.identifier.authority | Lam, TW=rp00135 | en_HK |
dc.identifier.authority | Yiu, SM=rp00207 | en_HK |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-8344235972 | en_HK |
dc.identifier.hkuros | 103185 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-8344235972&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.spage | 31 | en_HK |
dc.identifier.epage | 38 | en_HK |
dc.identifier.scopusauthorid | Hon, WK=7004282818 | en_HK |
dc.identifier.scopusauthorid | Lam, TW=7202523165 | en_HK |
dc.identifier.scopusauthorid | Sung, WK=13310059700 | en_HK |
dc.identifier.scopusauthorid | Tse, WL=35992065800 | en_HK |
dc.identifier.scopusauthorid | Wong, CK=7404953816 | en_HK |
dc.identifier.scopusauthorid | Yiu, SM=7003282240 | en_HK |