File Download
Supplementary

postgraduate thesis: Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning

TitleAccurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning
Authors
Advisors
Advisor(s):Lam, TW
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhang, Y. [張亦凡]. (2022). Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractSingle-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. Meanwhile, thanks to the rapid development of deep learning technologies, more and more previously impractical or computational-heavy tasks have been made possible or easier. Despite the great success of deep learning in numerous fields (e.g., image recognition), the application of deep learning algorithms in bioinformatics, especially in genome assembly, is still scarce. This thesis presents two deep learning-based tools, aiming to improve the quality of genome assemblies from a micro and macro perspective, respectively. CONNET is an accurate genome consensus tool. Genome consensus, which is essential to correct a draft assembly by resolving the discrepancies in the reads, is computationally intensive. In recent years, efficient consensus tools have emerged based on partial-order alignment. We discovered that the spatial relationship of alignment pileup, which could be utilized by deep learning, is crucial to high-quality consensus. CONNET showed the highest accuracy of any existing method. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. M-NET is the first reference-free misassembly detector for Nanopore sequencing data. Misassemblies are usually assessed with the help of a reference genome, which is not available during de novo assembly. M-NET predicts the presence of misassemblies solely based on the alignment pileup of raw reads to the assembly.
DegreeMaster of Philosophy
SubjectGenomics
Nanopores
Deep learning (Machine learning)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/322957

 

DC FieldValueLanguage
dc.contributor.advisorLam, TW-
dc.contributor.authorZhang, Yifan-
dc.contributor.author張亦凡-
dc.date.accessioned2022-11-18T10:42:08Z-
dc.date.available2022-11-18T10:42:08Z-
dc.date.issued2022-
dc.identifier.citationZhang, Y. [張亦凡]. (2022). Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/322957-
dc.description.abstractSingle-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. Meanwhile, thanks to the rapid development of deep learning technologies, more and more previously impractical or computational-heavy tasks have been made possible or easier. Despite the great success of deep learning in numerous fields (e.g., image recognition), the application of deep learning algorithms in bioinformatics, especially in genome assembly, is still scarce. This thesis presents two deep learning-based tools, aiming to improve the quality of genome assemblies from a micro and macro perspective, respectively. CONNET is an accurate genome consensus tool. Genome consensus, which is essential to correct a draft assembly by resolving the discrepancies in the reads, is computationally intensive. In recent years, efficient consensus tools have emerged based on partial-order alignment. We discovered that the spatial relationship of alignment pileup, which could be utilized by deep learning, is crucial to high-quality consensus. CONNET showed the highest accuracy of any existing method. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. M-NET is the first reference-free misassembly detector for Nanopore sequencing data. Misassemblies are usually assessed with the help of a reference genome, which is not available during de novo assembly. M-NET predicts the presence of misassemblies solely based on the alignment pileup of raw reads to the assembly.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshGenomics-
dc.subject.lcshNanopores-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleAccurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning-
dc.typePG_Thesis-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044609096803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats