Conference Paper: Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences

File Download Links for fulltext
(May Require Subscription)
Supplementary
  • Basic View
  • Metadata View
  • XML View
TitlePractical aspects of compressed suffix arrays and FM-index in searching DNA sequences
AuthorsHon, WK1
Lam, TW1
Sung, WK2
Tse, WL1
Wong, CK1
Yiu, SM1
Issue Date2004
CitationProceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 [How to Cite?]
AbstractSearching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes.
ReferencesReferences in Scopus
DC Field
Value
dc.contributor.authorHon, WK
dc.contributor.authorLam, TW
dc.contributor.authorSung, WK
dc.contributor.authorTse, WL
dc.contributor.authorWong, CK
dc.contributor.authorYiu, SM
dc.date.accessioned2010-09-25T14:50:10Z
dc.date.available2010-09-25T14:50:10Z
dc.date.issued2004
dc.description.abstractSearching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes.
dc.description.natureLink_to_subscribed_fulltext
dc.identifier.citationProceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 [How to Cite?]
dc.identifier.epage38
dc.identifier.hkuros103185
dc.identifier.scopuseid_2-s2.0-8344235972
dc.identifier.spage31
dc.identifier.urihttp://hdl.handle.net/10722/93076
dc.languageeng
dc.relation.ispartofProceedings of the Sixth Workshop on Algorithm Engineering and Experiments and the First Workshop on Analytic Algorithms and Combinatorics
dc.relation.referencesReferences in Scopus
dc.titlePractical aspects of compressed suffix arrays and FM-index in searching DNA sequences
dc.typeConference_Paper
Author Affiliations
  1. The University of Hong Kong
  2. National University of Singapore