File Download
Supplementary

postgraduate thesis: Data mining of post genome-wide association studies and next generation sequencing

TitleData mining of post genome-wide association studies and next generation sequencing
Authors
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Gui, H. [桂宏胜]. (2013). Data mining of post genome-wide association studies and next generation sequencing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5106519
AbstractGenome-wide association studies (GWAS) have been successfully applied to several complex diseases, yielding many confirmed associations. Nonetheless, at most they have explained half the genetic variance and often much less. It is quite apparent the rich GWAS datasets contain far more information than is typically uncovered using the most common univariate analysis approaches. The focus of the present thesis is on methods to extract the most information from GWAS and on post-GWAS experimental strategies, divided in four broad approaches. The first approach involves use of candidate gene studies to explore epistasis and gene by environment interactions, using samples of two different disorders, Hirschsprung disease (HSCR) and cognitive decline. For HSCR, previous studies identified rare and common variants in two genes, RET and NRG1, to be predisposing to disease, and further demonstrated a statistical interaction between common variants in these two genes. In this thesis, joint effects between common and rare variants both within and across the two genes were demonstrated by statistical modelling and then supported by functional interaction. For cognitive decline, SNPs previously implicated in Alzheimer’s disease were examined for epistasis and gene-environment interaction in an independent sample of elderly Chinese. The ACE rs1800764_C heterozygote in combination with below-college educational level was found to result in greater cognitive decline. These two studies demonstrate the utility of post-GWAS candidate gene studies in detectinginteraction effects. The next two approaches were adopted on GWAS summary statistics at the SNP level. One of them involves meta-analysis applied to 11 epilepsy GWAS datasets, to increase power and explore whether findings are population-specific or general across populations. Two novel susceptibility genes (SCN1a and PCDH7) were identified using this approach. Furthermore, the previously identified epilepsy risk variant CAMSAP1L1 was found to only be a risk factor for Chinese focal epilepsy patients. The other summary statistic approach involved the development of a revised GWAS pathway analysis pipeline to search for effective genes or gene-sets. Its application to two autoimmune diseases revealed that multiple pathways might be dysfunctional simultaneously and hence contribute jointly to disease status. In addition, it indicated the pipeline was powerful for mining moderate/small genetic effects on common disorders. The last approach to post-GWAS analysis involves the use of next-generation sequencing (NGS). To this end, an automated NGS pipeline for variant calling, filtering and prioritization was established, specifically designed for gene burden analysis, recurrent gene sharing and de novo mutation (DNM) identification. The pipeline was applied to NGS sequencing of 62 candidate genes and also whole exomes of HSCR patients and their parents. Results indicated that multiple rare damaging inherited variants in several genes contribute to HSCR; in addition, loss of function DNMs were significantly enriched in HSCR probands. This thesis demonstrates the utility of data mining approaches for the dissection and exploration of genetic determinants of complex diseases. Such methods and their results should ultimately contribute to genetic diagnosis and improving treatment for complex disorders.
DegreeDoctor of Philosophy
SubjectData mining
Nucleotide sequence - Data processing
Molecular genetics
Dept/ProgramPsychiatry
Persistent Identifierhttp://hdl.handle.net/10722/193419
HKU Library Item IDb5106519

 

DC FieldValueLanguage
dc.contributor.authorGui, Hongsheng-
dc.contributor.author桂宏胜-
dc.date.accessioned2014-01-06T23:09:12Z-
dc.date.available2014-01-06T23:09:12Z-
dc.date.issued2013-
dc.identifier.citationGui, H. [桂宏胜]. (2013). Data mining of post genome-wide association studies and next generation sequencing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5106519-
dc.identifier.urihttp://hdl.handle.net/10722/193419-
dc.description.abstractGenome-wide association studies (GWAS) have been successfully applied to several complex diseases, yielding many confirmed associations. Nonetheless, at most they have explained half the genetic variance and often much less. It is quite apparent the rich GWAS datasets contain far more information than is typically uncovered using the most common univariate analysis approaches. The focus of the present thesis is on methods to extract the most information from GWAS and on post-GWAS experimental strategies, divided in four broad approaches. The first approach involves use of candidate gene studies to explore epistasis and gene by environment interactions, using samples of two different disorders, Hirschsprung disease (HSCR) and cognitive decline. For HSCR, previous studies identified rare and common variants in two genes, RET and NRG1, to be predisposing to disease, and further demonstrated a statistical interaction between common variants in these two genes. In this thesis, joint effects between common and rare variants both within and across the two genes were demonstrated by statistical modelling and then supported by functional interaction. For cognitive decline, SNPs previously implicated in Alzheimer’s disease were examined for epistasis and gene-environment interaction in an independent sample of elderly Chinese. The ACE rs1800764_C heterozygote in combination with below-college educational level was found to result in greater cognitive decline. These two studies demonstrate the utility of post-GWAS candidate gene studies in detectinginteraction effects. The next two approaches were adopted on GWAS summary statistics at the SNP level. One of them involves meta-analysis applied to 11 epilepsy GWAS datasets, to increase power and explore whether findings are population-specific or general across populations. Two novel susceptibility genes (SCN1a and PCDH7) were identified using this approach. Furthermore, the previously identified epilepsy risk variant CAMSAP1L1 was found to only be a risk factor for Chinese focal epilepsy patients. The other summary statistic approach involved the development of a revised GWAS pathway analysis pipeline to search for effective genes or gene-sets. Its application to two autoimmune diseases revealed that multiple pathways might be dysfunctional simultaneously and hence contribute jointly to disease status. In addition, it indicated the pipeline was powerful for mining moderate/small genetic effects on common disorders. The last approach to post-GWAS analysis involves the use of next-generation sequencing (NGS). To this end, an automated NGS pipeline for variant calling, filtering and prioritization was established, specifically designed for gene burden analysis, recurrent gene sharing and de novo mutation (DNM) identification. The pipeline was applied to NGS sequencing of 62 candidate genes and also whole exomes of HSCR patients and their parents. Results indicated that multiple rare damaging inherited variants in several genes contribute to HSCR; in addition, loss of function DNMs were significantly enriched in HSCR probands. This thesis demonstrates the utility of data mining approaches for the dissection and exploration of genetic determinants of complex diseases. Such methods and their results should ultimately contribute to genetic diagnosis and improving treatment for complex disorders.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshData mining-
dc.subject.lcshNucleotide sequence - Data processing-
dc.subject.lcshMolecular genetics-
dc.titleData mining of post genome-wide association studies and next generation sequencing-
dc.typePG_Thesis-
dc.identifier.hkulb5106519-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplinePsychiatry-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2013-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats