File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Statistical analysis of human gastrointestinal microbiota using next generation sequencing data
Title | Statistical analysis of human gastrointestinal microbiota using next generation sequencing data |
---|---|
Authors | |
Issue Date | 2015 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Qin, Y. [覃友文]. (2015). Statistical analysis of human gastrointestinal microbiota using next generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5719476 |
Abstract | The human gastrointestinal tract is the niche of both commensal and pathogenic microbes which play an important role in human health. This thesis includes two independent studies relevant to analyzing next-generation sequencing data on the human gastrointestinal microbiota.
The first study conducted a comparative analysis on 16S rRNA gene sequencing data obtained from gastritis and gastric cancer patients in the Hong Kong (HK) and Korean cohorts. Neisseriaceae and Lachnospiraceae were the important families in segregating gastritis and cancer samples in the HK dataset while it was Streptococcaceae in the Korean dataset. Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria were the major phyla in the two cohorts, where they made up ≥ 99% of the total relative abundance. However, when narrowed down to the family level, the two datasets only shared 5 major families among the 15 and 13 major families in the HK and Korean datasets, respectively. Hierarchical clustering showed that samples were segregated into two major clusters according to the relative abundance of Helicobacteria pylori (H. pylori) in the two datasets. Moreover, the cross-prediction results for gastritis versus cancer between two datasets yielded up to 3 times larger error rates compared to the prediction results within the training set. Taken together, the differences between the HK and Korean cohorts in the gastric microbiota outweighed the similarities.
The second study developed a computational workflow to improve the draft genomes assembled from shotgun metagenomic sequencing data. The publicly available sequencing data of 396 human stool samples were downloaded for this purpose. Firstly, 3.9 million genes assembled from 396 samples were clustered into 7,381 co-abundance gene groups (CAGs) according to their pairwise correlations. The CAGs (741 CAGs) with more than 700 genes were defined as metagenomic species (MGSs), while the others (6,640 CAGs) were defined as metagenomic units (MGUs). In order to recover the relevant MGSs of the MGUs, the metagenomic deconvolution framework which decomposes the community-level gene content into taxon-specific gene profile was applied. Overall, 377 MGUs were assigned to 354 relevant MGSs, achieving a 9.57% mean improvement in the gene count of MGSs. Most of these MGSs were annotated to phylum Firmicutes. Specifically, the augmented results of 9 MGSs annotated to genus Faecalibacterium by their relative MGUs achieved average improvement of 21.08% and 17.84% in sensitivity and specificity. Importantly, MGUs included essential genes that were missed in MGSs, such as ribosomal genes, metabolism and transport system genes. Hence, the implementation of metagenomic deconvolution after binning improves the draft genomes of metagenomic species. |
Degree | Master of Philosophy |
Subject | Gastrointestinal system - Microbiology Nucleotide sequence |
Dept/Program | Psychiatry |
Persistent Identifier | http://hdl.handle.net/10722/223622 |
HKU Library Item ID | b5719476 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Qin, Youwen | - |
dc.contributor.author | 覃友文 | - |
dc.date.accessioned | 2016-03-03T23:16:51Z | - |
dc.date.available | 2016-03-03T23:16:51Z | - |
dc.date.issued | 2015 | - |
dc.identifier.citation | Qin, Y. [覃友文]. (2015). Statistical analysis of human gastrointestinal microbiota using next generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5719476 | - |
dc.identifier.uri | http://hdl.handle.net/10722/223622 | - |
dc.description.abstract | The human gastrointestinal tract is the niche of both commensal and pathogenic microbes which play an important role in human health. This thesis includes two independent studies relevant to analyzing next-generation sequencing data on the human gastrointestinal microbiota. The first study conducted a comparative analysis on 16S rRNA gene sequencing data obtained from gastritis and gastric cancer patients in the Hong Kong (HK) and Korean cohorts. Neisseriaceae and Lachnospiraceae were the important families in segregating gastritis and cancer samples in the HK dataset while it was Streptococcaceae in the Korean dataset. Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria were the major phyla in the two cohorts, where they made up ≥ 99% of the total relative abundance. However, when narrowed down to the family level, the two datasets only shared 5 major families among the 15 and 13 major families in the HK and Korean datasets, respectively. Hierarchical clustering showed that samples were segregated into two major clusters according to the relative abundance of Helicobacteria pylori (H. pylori) in the two datasets. Moreover, the cross-prediction results for gastritis versus cancer between two datasets yielded up to 3 times larger error rates compared to the prediction results within the training set. Taken together, the differences between the HK and Korean cohorts in the gastric microbiota outweighed the similarities. The second study developed a computational workflow to improve the draft genomes assembled from shotgun metagenomic sequencing data. The publicly available sequencing data of 396 human stool samples were downloaded for this purpose. Firstly, 3.9 million genes assembled from 396 samples were clustered into 7,381 co-abundance gene groups (CAGs) according to their pairwise correlations. The CAGs (741 CAGs) with more than 700 genes were defined as metagenomic species (MGSs), while the others (6,640 CAGs) were defined as metagenomic units (MGUs). In order to recover the relevant MGSs of the MGUs, the metagenomic deconvolution framework which decomposes the community-level gene content into taxon-specific gene profile was applied. Overall, 377 MGUs were assigned to 354 relevant MGSs, achieving a 9.57% mean improvement in the gene count of MGSs. Most of these MGSs were annotated to phylum Firmicutes. Specifically, the augmented results of 9 MGSs annotated to genus Faecalibacterium by their relative MGUs achieved average improvement of 21.08% and 17.84% in sensitivity and specificity. Importantly, MGUs included essential genes that were missed in MGSs, such as ribosomal genes, metabolism and transport system genes. Hence, the implementation of metagenomic deconvolution after binning improves the draft genomes of metagenomic species. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Gastrointestinal system - Microbiology | - |
dc.subject.lcsh | Nucleotide sequence | - |
dc.title | Statistical analysis of human gastrointestinal microbiota using next generation sequencing data | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5719476 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Psychiatry | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5719476 | - |
dc.identifier.mmsid | 991019122569703414 | - |