File Download
 
Links for fulltext
(May Require Subscription)
 
Supplementary

Conference Paper: Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences
  • Basic View
  • Metadata View
  • XML View
TitlePractical aspects of compressed suffix arrays and FM-index in searching DNA sequences
 
AuthorsHon, WK1
Lam, TW1
Sung, WK2
Tse, WL1
Wong, CK1
Yiu, SM1
 
Issue Date2004
 
CitationProceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 [How to Cite?]
 
AbstractSearching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes.
 
ReferencesReferences in Scopus
 
DC FieldValue
dc.contributor.authorHon, WK
 
dc.contributor.authorLam, TW
 
dc.contributor.authorSung, WK
 
dc.contributor.authorTse, WL
 
dc.contributor.authorWong, CK
 
dc.contributor.authorYiu, SM
 
dc.date.accessioned2010-09-25T14:50:10Z
 
dc.date.available2010-09-25T14:50:10Z
 
dc.date.issued2004
 
dc.description.abstractSearching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes.
 
dc.description.naturelink_to_subscribed_fulltext
 
dc.identifier.citationProceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38 [How to Cite?]
 
dc.identifier.epage38
 
dc.identifier.hkuros103185
 
dc.identifier.scopuseid_2-s2.0-8344235972
 
dc.identifier.spage31
 
dc.identifier.urihttp://hdl.handle.net/10722/93076
 
dc.languageeng
 
dc.relation.ispartofProceedings of the Sixth Workshop on Algorithm Engineering and Experiments and the First Workshop on Analytic Algorithms and Combinatorics
 
dc.relation.referencesReferences in Scopus
 
dc.titlePractical aspects of compressed suffix arrays and FM-index in searching DNA sequences
 
dc.typeConference_Paper
 
<?xml encoding="utf-8" version="1.0"?>
<item><contributor.author>Hon, WK</contributor.author>
<contributor.author>Lam, TW</contributor.author>
<contributor.author>Sung, WK</contributor.author>
<contributor.author>Tse, WL</contributor.author>
<contributor.author>Wong, CK</contributor.author>
<contributor.author>Yiu, SM</contributor.author>
<date.accessioned>2010-09-25T14:50:10Z</date.accessioned>
<date.available>2010-09-25T14:50:10Z</date.available>
<date.issued>2004</date.issued>
<identifier.citation>Proceedings Of The Sixth Workshop On Algorithm Engineering And Experiments And The First Workshop On Analytic Algorithms And Combinatorics, 2004, p. 31-38</identifier.citation>
<identifier.uri>http://hdl.handle.net/10722/93076</identifier.uri>
<description.abstract>Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compressed data structures, Compressed Suffix Array (CSA) and FM-index, in the context of searching and indexing DNA sequences. Our results show that CSA is better than FM-index for searching long patterns. We also investigate other practical aspects of the data structures such as the memory requirement for building the indexes.</description.abstract>
<language>eng</language>
<relation.ispartof>Proceedings of the Sixth Workshop on Algorithm Engineering and Experiments and the First Workshop on Analytic Algorithms and Combinatorics</relation.ispartof>
<title>Practical aspects of compressed suffix arrays and FM-index in searching DNA sequences</title>
<type>Conference_Paper</type>
<description.nature>link_to_subscribed_fulltext</description.nature>
<identifier.scopus>eid_2-s2.0-8344235972</identifier.scopus>
<identifier.hkuros>103185</identifier.hkuros>
<relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-8344235972&amp;selection=ref&amp;src=s&amp;origin=recordpage</relation.references>
<identifier.spage>31</identifier.spage>
<identifier.epage>38</identifier.epage>
</item>
Author Affiliations
  1. The University of Hong Kong
  2. National University of Singapore