File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Deep computational analysis of metagenomic data in taxonomic and functional dimensions
Title | Deep computational analysis of metagenomic data in taxonomic and functional dimensions |
---|---|
Authors | |
Advisors | |
Issue Date | 2019 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Yao, H. [姚皓彬]. (2019). Deep computational analysis of metagenomic data in taxonomic and functional dimensions. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Modern high-throughput sequencing technology enables researchers to directly extract DNA data from communities of microbiome as metagenomic data, and thus arouses the needs of computational tools to conduct efficient and accurate analysis for the vast amount of whole-genome sequencing reads generated every day. This thesis presents our contributions in taxonomic and functional analysis of metagenomic data.
Taxonomic annotation is often a preliminary and critical task in the pipeline of metagenomic analysis. Although existing tools based on k-mer mapping have displayed huge progress in terms of efficiency, performance degradation in absence of closely-related reference genomes is still a severe problem. Thus, we developed MetaAnnotator and Taxasense, two novel tools that significantly outperform existing tools when no species-level reference is available.
As a major breakthrough, the core concepts of MetaAnnotator are: (i) similarity calculation by k-mers in protein-encoding regions along references is more reliable; (ii) to determine the level of nodes for taxonomic annotation, we compute probabilistic models for every pair of genome and taxonomy in the reference database; (iii) we adopt BWT index to accelerate k-mer search queries.
Taxasense is an acceleration of MetaAnnotator by the adoption of wavelet-tree-index. The performance advantage of MetaAnnotator is maintained while it can apply on raw reads and short contigs.
Another dimension of metagenomics we concern is the discovery of antibiotic resistance genes in metagenomic data. To tackle the problem of inconsistent results between different ARG databases, we conduct a deep review of existing databases and identify reasons for inconsistency. Additionally, we propose methods to reduce the expected error rate for CARD, currently the most widely-used ARG database.
|
Degree | Doctor of Philosophy |
Subject | Metagenomics - Data processing |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/281605 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Yiu, SM | - |
dc.contributor.advisor | Lam, TW | - |
dc.contributor.author | Yao, Haobin | - |
dc.contributor.author | 姚皓彬 | - |
dc.date.accessioned | 2020-03-18T11:33:03Z | - |
dc.date.available | 2020-03-18T11:33:03Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Yao, H. [姚皓彬]. (2019). Deep computational analysis of metagenomic data in taxonomic and functional dimensions. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/281605 | - |
dc.description.abstract | Modern high-throughput sequencing technology enables researchers to directly extract DNA data from communities of microbiome as metagenomic data, and thus arouses the needs of computational tools to conduct efficient and accurate analysis for the vast amount of whole-genome sequencing reads generated every day. This thesis presents our contributions in taxonomic and functional analysis of metagenomic data. Taxonomic annotation is often a preliminary and critical task in the pipeline of metagenomic analysis. Although existing tools based on k-mer mapping have displayed huge progress in terms of efficiency, performance degradation in absence of closely-related reference genomes is still a severe problem. Thus, we developed MetaAnnotator and Taxasense, two novel tools that significantly outperform existing tools when no species-level reference is available. As a major breakthrough, the core concepts of MetaAnnotator are: (i) similarity calculation by k-mers in protein-encoding regions along references is more reliable; (ii) to determine the level of nodes for taxonomic annotation, we compute probabilistic models for every pair of genome and taxonomy in the reference database; (iii) we adopt BWT index to accelerate k-mer search queries. Taxasense is an acceleration of MetaAnnotator by the adoption of wavelet-tree-index. The performance advantage of MetaAnnotator is maintained while it can apply on raw reads and short contigs. Another dimension of metagenomics we concern is the discovery of antibiotic resistance genes in metagenomic data. To tackle the problem of inconsistent results between different ARG databases, we conduct a deep review of existing databases and identify reasons for inconsistency. Additionally, we propose methods to reduce the expected error rate for CARD, currently the most widely-used ARG database. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Metagenomics - Data processing | - |
dc.title | Deep computational analysis of metagenomic data in taxonomic and functional dimensions | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044214993403414 | - |
dc.date.hkucongregation | 2020 | - |
dc.identifier.mmsid | 991044214993403414 | - |