File Download
Supplementary

postgraduate thesis: Statistical analysis of human gastrointestinal microbiota using next generation sequencing data

TitleStatistical analysis of human gastrointestinal microbiota using next generation sequencing data
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Qin, Y. [覃友文]. (2015). Statistical analysis of human gastrointestinal microbiota using next generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5719476
AbstractThe human gastrointestinal tract is the niche of both commensal and pathogenic microbes which play an important role in human health. This thesis includes two independent studies relevant to analyzing next-generation sequencing data on the human gastrointestinal microbiota. The first study conducted a comparative analysis on 16S rRNA gene sequencing data obtained from gastritis and gastric cancer patients in the Hong Kong (HK) and Korean cohorts. Neisseriaceae and Lachnospiraceae were the important families in segregating gastritis and cancer samples in the HK dataset while it was Streptococcaceae in the Korean dataset. Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria were the major phyla in the two cohorts, where they made up ≥ 99% of the total relative abundance. However, when narrowed down to the family level, the two datasets only shared 5 major families among the 15 and 13 major families in the HK and Korean datasets, respectively. Hierarchical clustering showed that samples were segregated into two major clusters according to the relative abundance of Helicobacteria pylori (H. pylori) in the two datasets. Moreover, the cross-prediction results for gastritis versus cancer between two datasets yielded up to 3 times larger error rates compared to the prediction results within the training set. Taken together, the differences between the HK and Korean cohorts in the gastric microbiota outweighed the similarities. The second study developed a computational workflow to improve the draft genomes assembled from shotgun metagenomic sequencing data. The publicly available sequencing data of 396 human stool samples were downloaded for this purpose. Firstly, 3.9 million genes assembled from 396 samples were clustered into 7,381 co-abundance gene groups (CAGs) according to their pairwise correlations. The CAGs (741 CAGs) with more than 700 genes were defined as metagenomic species (MGSs), while the others (6,640 CAGs) were defined as metagenomic units (MGUs). In order to recover the relevant MGSs of the MGUs, the metagenomic deconvolution framework which decomposes the community-level gene content into taxon-specific gene profile was applied. Overall, 377 MGUs were assigned to 354 relevant MGSs, achieving a 9.57% mean improvement in the gene count of MGSs. Most of these MGSs were annotated to phylum Firmicutes. Specifically, the augmented results of 9 MGSs annotated to genus Faecalibacterium by their relative MGUs achieved average improvement of 21.08% and 17.84% in sensitivity and specificity. Importantly, MGUs included essential genes that were missed in MGSs, such as ribosomal genes, metabolism and transport system genes. Hence, the implementation of metagenomic deconvolution after binning improves the draft genomes of metagenomic species.
DegreeMaster of Philosophy
SubjectGastrointestinal system - Microbiology
Nucleotide sequence
Dept/ProgramPsychiatry
Persistent Identifierhttp://hdl.handle.net/10722/223622

 

DC FieldValueLanguage
dc.contributor.authorQin, Youwen-
dc.contributor.author覃友文-
dc.date.accessioned2016-03-03T23:16:51Z-
dc.date.available2016-03-03T23:16:51Z-
dc.date.issued2015-
dc.identifier.citationQin, Y. [覃友文]. (2015). Statistical analysis of human gastrointestinal microbiota using next generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5719476-
dc.identifier.urihttp://hdl.handle.net/10722/223622-
dc.description.abstractThe human gastrointestinal tract is the niche of both commensal and pathogenic microbes which play an important role in human health. This thesis includes two independent studies relevant to analyzing next-generation sequencing data on the human gastrointestinal microbiota. The first study conducted a comparative analysis on 16S rRNA gene sequencing data obtained from gastritis and gastric cancer patients in the Hong Kong (HK) and Korean cohorts. Neisseriaceae and Lachnospiraceae were the important families in segregating gastritis and cancer samples in the HK dataset while it was Streptococcaceae in the Korean dataset. Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria and Fusobacteria were the major phyla in the two cohorts, where they made up ≥ 99% of the total relative abundance. However, when narrowed down to the family level, the two datasets only shared 5 major families among the 15 and 13 major families in the HK and Korean datasets, respectively. Hierarchical clustering showed that samples were segregated into two major clusters according to the relative abundance of Helicobacteria pylori (H. pylori) in the two datasets. Moreover, the cross-prediction results for gastritis versus cancer between two datasets yielded up to 3 times larger error rates compared to the prediction results within the training set. Taken together, the differences between the HK and Korean cohorts in the gastric microbiota outweighed the similarities. The second study developed a computational workflow to improve the draft genomes assembled from shotgun metagenomic sequencing data. The publicly available sequencing data of 396 human stool samples were downloaded for this purpose. Firstly, 3.9 million genes assembled from 396 samples were clustered into 7,381 co-abundance gene groups (CAGs) according to their pairwise correlations. The CAGs (741 CAGs) with more than 700 genes were defined as metagenomic species (MGSs), while the others (6,640 CAGs) were defined as metagenomic units (MGUs). In order to recover the relevant MGSs of the MGUs, the metagenomic deconvolution framework which decomposes the community-level gene content into taxon-specific gene profile was applied. Overall, 377 MGUs were assigned to 354 relevant MGSs, achieving a 9.57% mean improvement in the gene count of MGSs. Most of these MGSs were annotated to phylum Firmicutes. Specifically, the augmented results of 9 MGSs annotated to genus Faecalibacterium by their relative MGUs achieved average improvement of 21.08% and 17.84% in sensitivity and specificity. Importantly, MGUs included essential genes that were missed in MGSs, such as ribosomal genes, metabolism and transport system genes. Hence, the implementation of metagenomic deconvolution after binning improves the draft genomes of metagenomic species.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshGastrointestinal system - Microbiology-
dc.subject.lcshNucleotide sequence-
dc.titleStatistical analysis of human gastrointestinal microbiota using next generation sequencing data-
dc.typePG_Thesis-
dc.identifier.hkulb5719476-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplinePsychiatry-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats