File Download
Supplementary

postgraduate thesis: An all-purpose genome assembler for next-generation sequencing reads

TitleAn all-purpose genome assembler for next-generation sequencing reads
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Luo, R. [罗锐邦]. (2015). An all-purpose genome assembler for next-generation sequencing reads. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5731093
AbstractThere is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads. However, several big challenges remain to be overcome to make it efficient, accurate, and versatile. Stem from the very short read length provided at the emerging stage of NGS, early assemblers, though have been successfully applied to assemble some published genomes, failed in leveraging reads generated by newer generation sequencers. The new reads are not only longer, but also exhibit improved profiles and patterns that green-lighted some previously prohibitive genome studies. However, this requires new algorithms to be developed. SOAPdenovo2 is developed with a new algorithm design that: 1) reduces memory consumption in graph construction; 2) resolves more complex repetitive regions in contig assembly; 3) increases coverage and length in scaffolding; 4) improves gap closing, and 5) optimizes for large genomes. Benchmark using the public datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive compare to other assemblers in both assembly length and accuracy. SOAPdenovo2 was developed with versatility as a top priority. Working alone or as a part of a pipeline, SOAPdenovo2 successfully illustrated its power by 1) presenting detailed structural variation (SV) maps of an Asian and African genome and showing that whole genome de novo assembly could serve as a new solution to a more comprehensive SV map; 2) drafting the highly polymorphic and repetitive Oyster genome and showing that complicated oceanic species could be assembled by SOAPdenovo2 together with hierarchical assembly strategy; and 3) finishing the assembly of a haplotype-resolved diploid genome without using a reference genome. The community has also successfully applied SOAPdenovo2 in assembling over a hundred species. The versatility of SOAPdenovo2 was also exemplified by developing SOAPdenovo-Trans, an assembler tailored for transcriptome assembly using RNA sequencing data. Benchmarking on known transcripts from well-annotated genomes, SOAPdenovo-Trans outperforms two other software on identifying alternative splicing and differential expression levels.
DegreeDoctor of Philosophy
SubjectNucleotide sequence - Data processing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/224657

 

DC FieldValueLanguage
dc.contributor.authorLuo, Ruibang-
dc.contributor.author罗锐邦-
dc.date.accessioned2016-04-11T23:15:20Z-
dc.date.available2016-04-11T23:15:20Z-
dc.date.issued2015-
dc.identifier.citationLuo, R. [罗锐邦]. (2015). An all-purpose genome assembler for next-generation sequencing reads. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5731093-
dc.identifier.urihttp://hdl.handle.net/10722/224657-
dc.description.abstractThere is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads. However, several big challenges remain to be overcome to make it efficient, accurate, and versatile. Stem from the very short read length provided at the emerging stage of NGS, early assemblers, though have been successfully applied to assemble some published genomes, failed in leveraging reads generated by newer generation sequencers. The new reads are not only longer, but also exhibit improved profiles and patterns that green-lighted some previously prohibitive genome studies. However, this requires new algorithms to be developed. SOAPdenovo2 is developed with a new algorithm design that: 1) reduces memory consumption in graph construction; 2) resolves more complex repetitive regions in contig assembly; 3) increases coverage and length in scaffolding; 4) improves gap closing, and 5) optimizes for large genomes. Benchmark using the public datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive compare to other assemblers in both assembly length and accuracy. SOAPdenovo2 was developed with versatility as a top priority. Working alone or as a part of a pipeline, SOAPdenovo2 successfully illustrated its power by 1) presenting detailed structural variation (SV) maps of an Asian and African genome and showing that whole genome de novo assembly could serve as a new solution to a more comprehensive SV map; 2) drafting the highly polymorphic and repetitive Oyster genome and showing that complicated oceanic species could be assembled by SOAPdenovo2 together with hierarchical assembly strategy; and 3) finishing the assembly of a haplotype-resolved diploid genome without using a reference genome. The community has also successfully applied SOAPdenovo2 in assembling over a hundred species. The versatility of SOAPdenovo2 was also exemplified by developing SOAPdenovo-Trans, an assembler tailored for transcriptome assembly using RNA sequencing data. Benchmarking on known transcripts from well-annotated genomes, SOAPdenovo-Trans outperforms two other software on identifying alternative splicing and differential expression levels.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshNucleotide sequence - Data processing-
dc.titleAn all-purpose genome assembler for next-generation sequencing reads-
dc.typePG_Thesis-
dc.identifier.hkulb5731093-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats