File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Compressed indexing and local alignment of DNA

TitleCompressed indexing and local alignment of DNA
Authors
Issue Date2008
PublisherOxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/
Citation
Bioinformatics, 2008, v. 24 n. 6, p. 791-797 How to Cite?
Abstract
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O ( T P ) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O ( T 0.628 P ) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). © The Author 2008. Published by Oxford University Press. All rights reserved.
Persistent Identifierhttp://hdl.handle.net/10722/89140
ISSN
2013 Impact Factor: 4.621
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorLam, TWen_HK
dc.contributor.authorSung, WKen_HK
dc.contributor.authorTam, SLen_HK
dc.contributor.authorWong, CKen_HK
dc.contributor.authorYiu, SMen_HK
dc.date.accessioned2010-09-06T09:52:53Z-
dc.date.available2010-09-06T09:52:53Z-
dc.date.issued2008en_HK
dc.identifier.citationBioinformatics, 2008, v. 24 n. 6, p. 791-797en_HK
dc.identifier.issn1367-4803en_HK
dc.identifier.urihttp://hdl.handle.net/10722/89140-
dc.description.abstractMotivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed their practicality for indexing very long strings such as the human genome in the main memory. For example, a BWT index for the human genome (with about 3 billion characters) occupies just around 1 G bytes. However, these indexes are designed for exact pattern matching, which is too stringent for biological applications. The demand is often on finding local alignments (pairs of similar substrings with gaps allowed). Without indexing, one can use dynamic programming to find all the local alignments between a text T and a pattern P in O ( T P ) time, but this would be too slow when the text is of genome scale (e.g. aligning a gene with the human genome would take tens to hundreds of hours). In practice, biologists use heuristic-based software such as BLAST, which is very efficient but does not guarantee to find all local alignments. Results: In this article, we show how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments. Experiments reveal that BWT-SW is very efficient (e.g. aligning a pattern of length 3 000 with the human genome takes less than a minute). We have also analyzed BWT-SW mathematically for a simpler similarity model (with gaps disallowed), and we show that the expected running time is O ( T 0.628 P ) for random strings. As far as we know, BWT-SW is the first practical tool that can find all local alignments. Yet BWT-SW is not meant to be a replacement of BLAST, as BLAST is still several times faster than BWT-SW for long patterns and BLAST is indeed accurate enough in most cases (we have used BWT-SW to check against the accuracy of BLAST and found that only rarely BLAST would miss some significant alignments). © The Author 2008. Published by Oxford University Press. All rights reserved.en_HK
dc.languageengen_HK
dc.publisherOxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/en_HK
dc.relation.ispartofBioinformaticsen_HK
dc.rightsBioinformatics. Copyright © Oxford University Press.en_HK
dc.subject.meshAlgorithmsen_HK
dc.subject.meshBase Sequenceen_HK
dc.subject.meshChromosome Mapping - methodsen_HK
dc.subject.meshDNA - geneticsen_HK
dc.subject.meshData Compression - methodsen_HK
dc.subject.meshGenome, Human - geneticsen_HK
dc.subject.meshHumansen_HK
dc.subject.meshMolecular Sequence Dataen_HK
dc.subject.meshSequence Alignment - methodsen_HK
dc.subject.meshSequence Analysis, DNA - methodsen_HK
dc.titleCompressed indexing and local alignment of DNAen_HK
dc.typeArticleen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1367-4803&volume=24 &issue=6&spage=791&epage=797&date=2008&atitle=Compressed+indexing+and+local+alignment+of+DNAen_HK
dc.identifier.emailLam, TW:twlam@cs.hku.hken_HK
dc.identifier.emailYiu, SM:smyiu@cs.hku.hken_HK
dc.identifier.authorityLam, TW=rp00135en_HK
dc.identifier.authorityYiu, SM=rp00207en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1093/bioinformatics/btn032en_HK
dc.identifier.pmid18227115en_HK
dc.identifier.scopuseid_2-s2.0-40749108125en_HK
dc.identifier.hkuros146733en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-40749108125&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume24en_HK
dc.identifier.issue6en_HK
dc.identifier.spage791en_HK
dc.identifier.epage797en_HK
dc.identifier.eissn1460-2059-
dc.identifier.isiWOS:000254010400008-
dc.publisher.placeUnited Kingdomen_HK
dc.identifier.scopusauthoridLam, TW=7202523165en_HK
dc.identifier.scopusauthoridSung, WK=13310059700en_HK
dc.identifier.scopusauthoridTam, SL=14042926200en_HK
dc.identifier.scopusauthoridWong, CK=7404953816en_HK
dc.identifier.scopusauthoridYiu, SM=7003282240en_HK
dc.identifier.citeulike2835102-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats