File Download
Supplementary

postgraduate thesis: Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits

TitleStatistical methods for genome-wide association analysis and polygenic risk prediction in complex traits
Authors
Advisors
Advisor(s):Sham, PCTang, SM
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wu, T. [吳天]. (2022). Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe past two decades have witnessed rich research output regarding the biological mechanisms, relationships, and genetic risk prediction of complex traits and diseases based on data generated by genome-wide association studies (GWAS). It is acknowledged that future GWAS yields, specifically the detection of disease associated single-nucleotide polymorphisms (SNPs) and polygenic prediction performance, depend on the genetic architecture of phenotypes and the GWAS sample size. However, the exact relationship is not yet clear and neither is how to effectively boost genetic risk prediction accuracy by leveraging external information such as functional annotation. To address these issues, this thesis developed and evaluated two novel methods and computational tools. These tools can be used to evaluate the full potential of GWAS study design in disease-associated SNP detection and polygenic prediction. A fast, flexible, and interactive online tool, the Polygenic Power Calculator (PPC), was developed to predict the key GWAS outcomes under the independent SNPs and point-normal effect size distribution assumption. Through careful theoretical derivation and simulation validation, the key GWAS outcomes—including the number of independent significant SNPs, variance explained by these SNPs, and prediction accuracy of polygenic scores (PGS) constructed by these SNPs—are shown as a function of the power distribution of causal SNPs and the parameters capturing the genetic architecture of phenotypes in a closed-form expression. When the phenotype is binary, a comprehensive scale-transformation method is proposed and recommended to be used to facilitate calculation and generate more interpretable prediction results. The PPC method was applied to four complex traits and evaluated by comparing the theoretical prediction and reported GWAS results. This tool can be used to facilitate the planning of future GWAS and PGS studies, and to explore the future behaviour of GWAS as sample size further increases. A genetic risk prediction tool lassosumfunct was developed to incorporate functional information into the SNP effect size estimation model and construct PGSs to pursue higher genetic risk prediction accuracy. A novel index was proposed to indicate the biological importance of SNPs. Such an index was included in an elastic net model to estimate SNP effects by a modified coordinate descent algorithm. Simulation results demonstrate that the newly proposed index is informative and lassosumfunct generates systematically and significantly better prediction results compared with methods that do not consider functional information, regardless of the genetic architecture of the phenotype and the sample sizes of GWAS training, reference, and validation data. Applying lassosumfunct to type II diabetes GWAS data shows almost the highest prediction accuracy compared with other reported results by similar PGS construction methods, while taking the shortest running time. A user-friendly R package is provided to implement lassosumfunct. This method is expected to efficiently improve genetic risk prediction accuracy, facilitate disease risk stratification in populations, and thus contribute to accelerating the development of precision medicine.
DegreeDoctor of Philosophy
SubjectGenomics - Statistical methods
Dept/ProgramPsychiatry
Persistent Identifierhttp://hdl.handle.net/10722/323678

 

DC FieldValueLanguage
dc.contributor.advisorSham, PC-
dc.contributor.advisorTang, SM-
dc.contributor.authorWu, Tian-
dc.contributor.author吳天-
dc.date.accessioned2023-01-09T01:48:23Z-
dc.date.available2023-01-09T01:48:23Z-
dc.date.issued2022-
dc.identifier.citationWu, T. [吳天]. (2022). Statistical methods for genome-wide association analysis and polygenic risk prediction in complex traits. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/323678-
dc.description.abstractThe past two decades have witnessed rich research output regarding the biological mechanisms, relationships, and genetic risk prediction of complex traits and diseases based on data generated by genome-wide association studies (GWAS). It is acknowledged that future GWAS yields, specifically the detection of disease associated single-nucleotide polymorphisms (SNPs) and polygenic prediction performance, depend on the genetic architecture of phenotypes and the GWAS sample size. However, the exact relationship is not yet clear and neither is how to effectively boost genetic risk prediction accuracy by leveraging external information such as functional annotation. To address these issues, this thesis developed and evaluated two novel methods and computational tools. These tools can be used to evaluate the full potential of GWAS study design in disease-associated SNP detection and polygenic prediction. A fast, flexible, and interactive online tool, the Polygenic Power Calculator (PPC), was developed to predict the key GWAS outcomes under the independent SNPs and point-normal effect size distribution assumption. Through careful theoretical derivation and simulation validation, the key GWAS outcomes—including the number of independent significant SNPs, variance explained by these SNPs, and prediction accuracy of polygenic scores (PGS) constructed by these SNPs—are shown as a function of the power distribution of causal SNPs and the parameters capturing the genetic architecture of phenotypes in a closed-form expression. When the phenotype is binary, a comprehensive scale-transformation method is proposed and recommended to be used to facilitate calculation and generate more interpretable prediction results. The PPC method was applied to four complex traits and evaluated by comparing the theoretical prediction and reported GWAS results. This tool can be used to facilitate the planning of future GWAS and PGS studies, and to explore the future behaviour of GWAS as sample size further increases. A genetic risk prediction tool lassosumfunct was developed to incorporate functional information into the SNP effect size estimation model and construct PGSs to pursue higher genetic risk prediction accuracy. A novel index was proposed to indicate the biological importance of SNPs. Such an index was included in an elastic net model to estimate SNP effects by a modified coordinate descent algorithm. Simulation results demonstrate that the newly proposed index is informative and lassosumfunct generates systematically and significantly better prediction results compared with methods that do not consider functional information, regardless of the genetic architecture of the phenotype and the sample sizes of GWAS training, reference, and validation data. Applying lassosumfunct to type II diabetes GWAS data shows almost the highest prediction accuracy compared with other reported results by similar PGS construction methods, while taking the shortest running time. A user-friendly R package is provided to implement lassosumfunct. This method is expected to efficiently improve genetic risk prediction accuracy, facilitate disease risk stratification in populations, and thus contribute to accelerating the development of precision medicine.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshGenomics - Statistical methods-
dc.titleStatistical methods for genome-wide association analysis and polygenic risk prediction in complex traits-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplinePsychiatry-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044625594803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats