File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler

TitleSOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
Authors
KeywordsGenome
Assembly
Contig
Scaffold
Error correction
Gap-filling
Issue Date2012
PublisherBioMed Central.
Citation
GigaScience, 2012, v. 1 n. 18, p. 1-6 How to Cite?
AbstractBackground: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. Findings: To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Conclusions: Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were 20.9 kbp and 22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was 2/3 lower during the point of largest memory consumption.
Persistent Identifierhttp://hdl.handle.net/10722/190307
ISSN
2021 Impact Factor: 7.658
2020 SCImago Journal Rankings: 2.947
PubMed Central ID
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLuo, Ren_US
dc.contributor.authorLiu, Ben_US
dc.contributor.authorXie, Yen_US
dc.contributor.authorLi, Zen_US
dc.contributor.authorHuang, Wen_US
dc.contributor.authorYuan, Jen_US
dc.contributor.authorHe, Gen_US
dc.contributor.authorChen, Yen_US
dc.contributor.authorPan, Q-
dc.contributor.authorLiu, Y-
dc.contributor.authorTang, J-
dc.contributor.authorWu, G-
dc.contributor.authorZhang, H-
dc.contributor.authorShi, Y-
dc.contributor.authorLiu, Y-
dc.contributor.authorYu, C-
dc.contributor.authorWang, B-
dc.contributor.authorLu, Y-
dc.contributor.authorHan, C-
dc.contributor.authorCheung, DWL-
dc.contributor.authorYiu, SM-
dc.contributor.authorPeng, S-
dc.contributor.authorZhu, X-
dc.contributor.authorLiu, G-
dc.contributor.authorLiao, X-
dc.contributor.authorLi, Y-
dc.contributor.authorYang, H-
dc.contributor.authorWang, J-
dc.contributor.authorLam, TW-
dc.contributor.authorWang, J-
dc.date.accessioned2013-09-17T15:18:29Z-
dc.date.available2013-09-17T15:18:29Z-
dc.date.issued2012en_US
dc.identifier.citationGigaScience, 2012, v. 1 n. 18, p. 1-6en_US
dc.identifier.issn2047-217X-
dc.identifier.urihttp://hdl.handle.net/10722/190307-
dc.description.abstractBackground: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. Findings: To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Conclusions: Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were 20.9 kbp and 22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was 2/3 lower during the point of largest memory consumption.-
dc.languageengen_US
dc.publisherBioMed Central.en_US
dc.relation.ispartofGigaScienceen_US
dc.rightsGigaScience. Copyright © BioMed Central.en_US
dc.subjectGenome-
dc.subjectAssembly-
dc.subjectContig-
dc.subjectScaffold-
dc.subjectError correction-
dc.subjectGap-filling-
dc.titleSOAPdenovo2: an empirically improved memory-efficient short-read de novo assembleren_US
dc.typeArticleen_US
dc.identifier.emailLuo, R: rbluo@hku.hken_US
dc.identifier.emailLiu, B: bhliu@hku.hken_US
dc.identifier.emailCheung, DWL: dcheung@cs.hku.hken_US
dc.identifier.emailYiu, SM: smyiu@cs.hku.hken_US
dc.identifier.emailLam, TW: twlam@cs.hku.hk-
dc.identifier.authorityCheung, DWL=rp00101en_US
dc.identifier.authorityYiu, SM=rp00207en_US
dc.description.naturelink_to_OA_fulltext-
dc.identifier.doi10.1186/2047-217X-1-18-
dc.identifier.pmid23587118-
dc.identifier.pmcidPMC3626529-
dc.identifier.scopuseid_2-s2.0-84942887758-
dc.identifier.hkuros222162en_US
dc.identifier.volume1en_US
dc.identifier.issue18-
dc.identifier.spage1-
dc.identifier.epage6-
dc.identifier.isiWOS:000321040100001-
dc.publisher.placeUnited Kingdom-
dc.identifier.f1000718180400-
dc.identifier.issnl2047-217X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats