File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits
Title | Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits |
---|---|
Authors | |
Advisors | |
Issue Date | 2022 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wu, T. [吳天]. (2022). Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | The past two decades have witnessed rich research output regarding the biological mechanisms, relationships, and genetic risk prediction of complex traits and diseases based on data generated by genome-wide association studies (GWAS). It is acknowledged that future GWAS yields, specifically the detection of disease associated single-nucleotide polymorphisms (SNPs) and polygenic prediction performance, depend on the genetic architecture of phenotypes and the GWAS sample size. However, the exact relationship is not yet clear and neither is how to effectively boost genetic risk prediction accuracy by leveraging external information such as functional annotation. To address these issues, this thesis developed and evaluated two novel methods and computational tools. These tools can be used to evaluate the full potential of GWAS study design in disease-associated SNP detection and polygenic prediction.
A fast, flexible, and interactive online tool, the Polygenic Power Calculator (PPC), was developed to predict the key GWAS outcomes under the independent SNPs and point-normal effect size distribution assumption. Through careful theoretical derivation and simulation validation, the key GWAS outcomes—including the number of independent significant SNPs, variance explained by these SNPs, and prediction accuracy of polygenic scores (PGS) constructed by these SNPs—are shown as a function of the power distribution of causal SNPs and the parameters capturing the genetic architecture of phenotypes in a closed-form expression. When the phenotype is binary, a comprehensive scale-transformation method is proposed and recommended to be used to facilitate calculation and generate more interpretable prediction results. The PPC method was applied to four complex traits and evaluated by comparing the theoretical prediction and reported GWAS results. This tool can be used to facilitate the planning of future GWAS and PGS studies, and to explore the future behaviour of GWAS as sample size further increases.
A genetic risk prediction tool lassosumfunct was developed to incorporate functional information into the SNP effect size estimation model and construct PGSs to pursue higher genetic risk prediction accuracy. A novel index was proposed to indicate the biological importance of SNPs. Such an index was included in an elastic net model to estimate SNP effects by a modified coordinate descent algorithm. Simulation results demonstrate that the newly proposed index is informative and lassosumfunct generates systematically and significantly better prediction results compared with methods that do not consider functional information, regardless of the genetic architecture of the phenotype and the sample sizes of GWAS training, reference, and validation data. Applying lassosumfunct to type II diabetes GWAS data shows almost the highest prediction accuracy compared with other reported results by similar PGS construction methods, while taking the shortest running time. A user-friendly R package is provided to implement lassosumfunct. This method is expected to efficiently improve genetic risk prediction accuracy, facilitate disease risk stratification in populations, and thus contribute to accelerating the development of precision medicine. |
Degree | Doctor of Philosophy |
Subject | Genomics - Statistical methods |
Dept/Program | Psychiatry |
Persistent Identifier | http://hdl.handle.net/10722/323678 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Sham, PC | - |
dc.contributor.advisor | Tang, SM | - |
dc.contributor.author | Wu, Tian | - |
dc.contributor.author | 吳天 | - |
dc.date.accessioned | 2023-01-09T01:48:23Z | - |
dc.date.available | 2023-01-09T01:48:23Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Wu, T. [吳天]. (2022). Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/323678 | - |
dc.description.abstract | The past two decades have witnessed rich research output regarding the biological mechanisms, relationships, and genetic risk prediction of complex traits and diseases based on data generated by genome-wide association studies (GWAS). It is acknowledged that future GWAS yields, specifically the detection of disease associated single-nucleotide polymorphisms (SNPs) and polygenic prediction performance, depend on the genetic architecture of phenotypes and the GWAS sample size. However, the exact relationship is not yet clear and neither is how to effectively boost genetic risk prediction accuracy by leveraging external information such as functional annotation. To address these issues, this thesis developed and evaluated two novel methods and computational tools. These tools can be used to evaluate the full potential of GWAS study design in disease-associated SNP detection and polygenic prediction. A fast, flexible, and interactive online tool, the Polygenic Power Calculator (PPC), was developed to predict the key GWAS outcomes under the independent SNPs and point-normal effect size distribution assumption. Through careful theoretical derivation and simulation validation, the key GWAS outcomes—including the number of independent significant SNPs, variance explained by these SNPs, and prediction accuracy of polygenic scores (PGS) constructed by these SNPs—are shown as a function of the power distribution of causal SNPs and the parameters capturing the genetic architecture of phenotypes in a closed-form expression. When the phenotype is binary, a comprehensive scale-transformation method is proposed and recommended to be used to facilitate calculation and generate more interpretable prediction results. The PPC method was applied to four complex traits and evaluated by comparing the theoretical prediction and reported GWAS results. This tool can be used to facilitate the planning of future GWAS and PGS studies, and to explore the future behaviour of GWAS as sample size further increases. A genetic risk prediction tool lassosumfunct was developed to incorporate functional information into the SNP effect size estimation model and construct PGSs to pursue higher genetic risk prediction accuracy. A novel index was proposed to indicate the biological importance of SNPs. Such an index was included in an elastic net model to estimate SNP effects by a modified coordinate descent algorithm. Simulation results demonstrate that the newly proposed index is informative and lassosumfunct generates systematically and significantly better prediction results compared with methods that do not consider functional information, regardless of the genetic architecture of the phenotype and the sample sizes of GWAS training, reference, and validation data. Applying lassosumfunct to type II diabetes GWAS data shows almost the highest prediction accuracy compared with other reported results by similar PGS construction methods, while taking the shortest running time. A user-friendly R package is provided to implement lassosumfunct. This method is expected to efficiently improve genetic risk prediction accuracy, facilitate disease risk stratification in populations, and thus contribute to accelerating the development of precision medicine. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Genomics - Statistical methods | - |
dc.title | Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Psychiatry | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2022 | - |
dc.identifier.mmsid | 991044625594803414 | - |