File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: An all-purpose genome assembler for next-generation sequencing reads
Title | An all-purpose genome assembler for next-generation sequencing reads |
---|---|
Authors | |
Issue Date | 2015 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Luo, R. [罗锐邦]. (2015). An all-purpose genome assembler for next-generation sequencing reads. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5731093 |
Abstract | There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads. However, several big challenges remain to be overcome to make it efficient, accurate, and versatile. Stem from the very short read length provided at the emerging stage of NGS, early assemblers, though have been successfully applied to assemble some published genomes, failed in leveraging reads generated by newer generation sequencers. The new reads are not only longer, but also exhibit improved profiles and patterns that green-lighted some previously prohibitive genome studies. However, this requires new algorithms to be developed.
SOAPdenovo2 is developed with a new algorithm design that: 1) reduces memory consumption in graph construction; 2) resolves more complex repetitive regions in contig assembly; 3) increases coverage and length in scaffolding; 4) improves gap closing, and 5) optimizes for large genomes. Benchmark using the public datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive compare to other assemblers in both assembly length and accuracy.
SOAPdenovo2 was developed with versatility as a top priority. Working alone or as a part of a pipeline, SOAPdenovo2 successfully illustrated its power by 1) presenting detailed structural variation (SV) maps of an Asian and African genome and showing that whole genome de novo assembly could serve as a new solution to a more comprehensive SV map; 2) drafting the highly polymorphic and repetitive Oyster genome and showing that complicated oceanic species could be assembled by SOAPdenovo2 together with hierarchical assembly strategy; and 3) finishing the assembly of a haplotype-resolved diploid genome without using a reference genome. The community has also successfully applied SOAPdenovo2 in assembling over a hundred species.
The versatility of SOAPdenovo2 was also exemplified by developing SOAPdenovo-Trans, an assembler tailored for transcriptome assembly using RNA sequencing data. Benchmarking on known transcripts from well-annotated genomes, SOAPdenovo-Trans outperforms two other software on identifying alternative splicing and differential expression levels. |
Degree | Doctor of Philosophy |
Subject | Nucleotide sequence - Data processing |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/224657 |
HKU Library Item ID | b5731093 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Luo, Ruibang | - |
dc.contributor.author | 罗锐邦 | - |
dc.date.accessioned | 2016-04-11T23:15:20Z | - |
dc.date.available | 2016-04-11T23:15:20Z | - |
dc.date.issued | 2015 | - |
dc.identifier.citation | Luo, R. [罗锐邦]. (2015). An all-purpose genome assembler for next-generation sequencing reads. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5731093 | - |
dc.identifier.uri | http://hdl.handle.net/10722/224657 | - |
dc.description.abstract | There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads. However, several big challenges remain to be overcome to make it efficient, accurate, and versatile. Stem from the very short read length provided at the emerging stage of NGS, early assemblers, though have been successfully applied to assemble some published genomes, failed in leveraging reads generated by newer generation sequencers. The new reads are not only longer, but also exhibit improved profiles and patterns that green-lighted some previously prohibitive genome studies. However, this requires new algorithms to be developed. SOAPdenovo2 is developed with a new algorithm design that: 1) reduces memory consumption in graph construction; 2) resolves more complex repetitive regions in contig assembly; 3) increases coverage and length in scaffolding; 4) improves gap closing, and 5) optimizes for large genomes. Benchmark using the public datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive compare to other assemblers in both assembly length and accuracy. SOAPdenovo2 was developed with versatility as a top priority. Working alone or as a part of a pipeline, SOAPdenovo2 successfully illustrated its power by 1) presenting detailed structural variation (SV) maps of an Asian and African genome and showing that whole genome de novo assembly could serve as a new solution to a more comprehensive SV map; 2) drafting the highly polymorphic and repetitive Oyster genome and showing that complicated oceanic species could be assembled by SOAPdenovo2 together with hierarchical assembly strategy; and 3) finishing the assembly of a haplotype-resolved diploid genome without using a reference genome. The community has also successfully applied SOAPdenovo2 in assembling over a hundred species. The versatility of SOAPdenovo2 was also exemplified by developing SOAPdenovo-Trans, an assembler tailored for transcriptome assembly using RNA sequencing data. Benchmarking on known transcripts from well-annotated genomes, SOAPdenovo-Trans outperforms two other software on identifying alternative splicing and differential expression levels. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.subject.lcsh | Nucleotide sequence - Data processing | - |
dc.title | An all-purpose genome assembler for next-generation sequencing reads | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5731093 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5731093 | - |
dc.identifier.mmsid | 991019253579703414 | - |