Statistical analysis of human gastrointestinal microbiota using next generation sequencing data

Qin, Youwen; 覃友文

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5719476

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Psychiatry: Theses

postgraduate thesis: Statistical analysis of human gastrointestinal microbiota using next generation sequencing data

Title	Statistical analysis of human gastrointestinal microbiota using next generation sequencing data
Authors	Qin, Youwen 覃友文
Issue Date	2015
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Qin, Y. [覃友文]. (2015). Statistical analysis of human gastrointestinal microbiota using next generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5719476
Abstract	The human gastrointestinal tract is the niche of both commensal and pathogenic microbes which play an important role in human health. This thesis includes two independent studies relevant to analyzing next-generation sequencing data on the human gastrointestinal microbiota. The first study conducted a comparative analysis on 16S rRNA gene sequencing data obtained from gastritis and gastric cancer patients in the Hong Kong (HK) and Korean cohorts. Neisseriaceae and Lachnospiraceae were the important families in segregating gastritis and cancer samples in the HK dataset while it was Streptococcaceae in the Korean dataset. Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria were the major phyla in the two cohorts, where they made up ≥ 99% of the total relative abundance. However, when narrowed down to the family level, the two datasets only shared 5 major families among the 15 and 13 major families in the HK and Korean datasets, respectively. Hierarchical clustering showed that samples were segregated into two major clusters according to the relative abundance of Helicobacteria pylori (H. pylori) in the two datasets. Moreover, the cross-prediction results for gastritis versus cancer between two datasets yielded up to 3 times larger error rates compared to the prediction results within the training set. Taken together, the differences between the HK and Korean cohorts in the gastric microbiota outweighed the similarities. The second study developed a computational workflow to improve the draft genomes assembled from shotgun metagenomic sequencing data. The publicly available sequencing data of 396 human stool samples were downloaded for this purpose. Firstly, 3.9 million genes assembled from 396 samples were clustered into 7,381 co-abundance gene groups (CAGs) according to their pairwise correlations. The CAGs (741 CAGs) with more than 700 genes were defined as metagenomic species (MGSs), while the others (6,640 CAGs) were defined as metagenomic units (MGUs). In order to recover the relevant MGSs of the MGUs, the metagenomic deconvolution framework which decomposes the community-level gene content into taxon-specific gene profile was applied. Overall, 377 MGUs were assigned to 354 relevant MGSs, achieving a 9.57% mean improvement in the gene count of MGSs. Most of these MGSs were annotated to phylum Firmicutes. Specifically, the augmented results of 9 MGSs annotated to genus Faecalibacterium by their relative MGUs achieved average improvement of 21.08% and 17.84% in sensitivity and specificity. Importantly, MGUs included essential genes that were missed in MGSs, such as ribosomal genes, metabolism and transport system genes. Hence, the implementation of metagenomic deconvolution after binning improves the draft genomes of metagenomic species.
Degree	Master of Philosophy
Subject	Gastrointestinal system - Microbiology Nucleotide sequence
Dept/Program	Psychiatry
Persistent Identifier	http://hdl.handle.net/10722/223622
HKU Library Item ID	b5719476

DC Field	Value	Language
dc.contributor.author	Qin, Youwen	-
dc.contributor.author	覃友文	-
dc.date.accessioned	2016-03-03T23:16:51Z	-
dc.date.available	2016-03-03T23:16:51Z	-
dc.date.issued	2015	-
dc.identifier.citation	Qin, Y. [覃友文]. (2015). Statistical analysis of human gastrointestinal microbiota using next generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5719476	-
dc.identifier.uri	http://hdl.handle.net/10722/223622	-
dc.description.abstract	The human gastrointestinal tract is the niche of both commensal and pathogenic microbes which play an important role in human health. This thesis includes two independent studies relevant to analyzing next-generation sequencing data on the human gastrointestinal microbiota. The first study conducted a comparative analysis on 16S rRNA gene sequencing data obtained from gastritis and gastric cancer patients in the Hong Kong (HK) and Korean cohorts. Neisseriaceae and Lachnospiraceae were the important families in segregating gastritis and cancer samples in the HK dataset while it was Streptococcaceae in the Korean dataset. Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria were the major phyla in the two cohorts, where they made up ≥ 99% of the total relative abundance. However, when narrowed down to the family level, the two datasets only shared 5 major families among the 15 and 13 major families in the HK and Korean datasets, respectively. Hierarchical clustering showed that samples were segregated into two major clusters according to the relative abundance of Helicobacteria pylori (H. pylori) in the two datasets. Moreover, the cross-prediction results for gastritis versus cancer between two datasets yielded up to 3 times larger error rates compared to the prediction results within the training set. Taken together, the differences between the HK and Korean cohorts in the gastric microbiota outweighed the similarities. The second study developed a computational workflow to improve the draft genomes assembled from shotgun metagenomic sequencing data. The publicly available sequencing data of 396 human stool samples were downloaded for this purpose. Firstly, 3.9 million genes assembled from 396 samples were clustered into 7,381 co-abundance gene groups (CAGs) according to their pairwise correlations. The CAGs (741 CAGs) with more than 700 genes were defined as metagenomic species (MGSs), while the others (6,640 CAGs) were defined as metagenomic units (MGUs). In order to recover the relevant MGSs of the MGUs, the metagenomic deconvolution framework which decomposes the community-level gene content into taxon-specific gene profile was applied. Overall, 377 MGUs were assigned to 354 relevant MGSs, achieving a 9.57% mean improvement in the gene count of MGSs. Most of these MGSs were annotated to phylum Firmicutes. Specifically, the augmented results of 9 MGSs annotated to genus Faecalibacterium by their relative MGUs achieved average improvement of 21.08% and 17.84% in sensitivity and specificity. Importantly, MGUs included essential genes that were missed in MGSs, such as ribosomal genes, metabolism and transport system genes. Hence, the implementation of metagenomic deconvolution after binning improves the draft genomes of metagenomic species.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Gastrointestinal system - Microbiology	-
dc.subject.lcsh	Nucleotide sequence	-
dc.title	Statistical analysis of human gastrointestinal microbiota using next generation sequencing data	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5719476	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Psychiatry	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5719476	-
dc.identifier.mmsid	991019122569703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Statistical analysis of human gastrointestinal microbiota using next generation sequencing data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats