File Download
Supplementary

postgraduate thesis: Exploring statistical methods for estimating heritability, functional enrichment and polygenic risk score using GWAS summary data in complex traits

TitleExploring statistical methods for estimating heritability, functional enrichment and polygenic risk score using GWAS summary data in complex traits
Authors
Advisors
Advisor(s):Sham, PCZhang, Y
Issue Date2025
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Xiong, Z. [熊泽蔚]. (2025). Exploring statistical methods for estimating heritability, functional enrichment and polygenic risk score using GWAS summary data in complex traits. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractGenome-Wide Association Studies (GWAS) have catalyzed a paradigm shift in our comprehension of the genetic foundations of numerous diseases, uncovering over 50,000 significant associations between genetic variants and common diseases or traits. These pivotal discoveries have not only shed light on previously unknown disease-causing genes and mechanisms, but have also accelerated advancements in personalized medicine, facilitating the identification of new drug targets, disease biomarkers for early detection and monitoring, and risk prediction, along with the development of therapies tailored to individual genotypes. This thesis accentuates the importance of post-GWAS analysis, particularly in the estimation of heritability, functional enrichment, and polygenic risk scores from GWAS summary data. This thesis commences with a succinct review of the standard GWAS procedure, along with a description of the background of SNP heritability estimation and polygenic risk scores (PGS) calculation in post-GWAS analysis. Following this, a brief literature review of existing methods with similar objectives is provided. Within the scope of this work, two innovative software tools are proposed. The first, dubbed as generalized-LD score regression (g-LDSC), partitions SNP heritability to estimate functional enrichments. This tool capitalizes on the correlation between $\chi^2$-statistics and the squared LD matrix, distinguishing itself from s-LDSC by employing feasible generalized least squares (FGLS) estimation to account for potential correlated error structures. Our simulation studies under various scenarios illustrate that g- LDSC furnishes more precise estimates of functional enrichment than the prevailing method, irrespective of model misspecification. When applied to GWAS summary statistics of 15 traits from the UK Biobank, estimates of functional enrichment using g-LDSC were found to be more conservative and realistic than those derived from s-LDSC. Moreover, g-LDSC identified a greater number of significantly enriched functional annotations among 24 functional annotations for the 15 traits than s-LDSC (118 vs. 51). The second software tool, termed as best subset selection using GWAS summary statistics (BSsum), employs $L_0$ norm-based penalized regression methods to estimate PGS. Through simulation studies under diverse scenarios, we demonstrate that under high-sparsity, low-polygenicity scenarios, the $L_0$ norm holds an edge over the $L_1$ norm. These groundbreaking statistical tools hold the promise to further refine the extraction of valuable insights from GWAS data, thus driving genetic research to unprecedented heights.
DegreeDoctor of Philosophy
SubjectPersonality - Genetic aspects
Dept/ProgramPsychiatry
Persistent Identifierhttp://hdl.handle.net/10722/358310

 

DC FieldValueLanguage
dc.contributor.advisorSham, PC-
dc.contributor.advisorZhang, Y-
dc.contributor.authorXiong, Zewei-
dc.contributor.author熊泽蔚-
dc.date.accessioned2025-07-31T14:06:42Z-
dc.date.available2025-07-31T14:06:42Z-
dc.date.issued2025-
dc.identifier.citationXiong, Z. [熊泽蔚]. (2025). Exploring statistical methods for estimating heritability, functional enrichment and polygenic risk score using GWAS summary data in complex traits. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/358310-
dc.description.abstractGenome-Wide Association Studies (GWAS) have catalyzed a paradigm shift in our comprehension of the genetic foundations of numerous diseases, uncovering over 50,000 significant associations between genetic variants and common diseases or traits. These pivotal discoveries have not only shed light on previously unknown disease-causing genes and mechanisms, but have also accelerated advancements in personalized medicine, facilitating the identification of new drug targets, disease biomarkers for early detection and monitoring, and risk prediction, along with the development of therapies tailored to individual genotypes. This thesis accentuates the importance of post-GWAS analysis, particularly in the estimation of heritability, functional enrichment, and polygenic risk scores from GWAS summary data. This thesis commences with a succinct review of the standard GWAS procedure, along with a description of the background of SNP heritability estimation and polygenic risk scores (PGS) calculation in post-GWAS analysis. Following this, a brief literature review of existing methods with similar objectives is provided. Within the scope of this work, two innovative software tools are proposed. The first, dubbed as generalized-LD score regression (g-LDSC), partitions SNP heritability to estimate functional enrichments. This tool capitalizes on the correlation between $\chi^2$-statistics and the squared LD matrix, distinguishing itself from s-LDSC by employing feasible generalized least squares (FGLS) estimation to account for potential correlated error structures. Our simulation studies under various scenarios illustrate that g- LDSC furnishes more precise estimates of functional enrichment than the prevailing method, irrespective of model misspecification. When applied to GWAS summary statistics of 15 traits from the UK Biobank, estimates of functional enrichment using g-LDSC were found to be more conservative and realistic than those derived from s-LDSC. Moreover, g-LDSC identified a greater number of significantly enriched functional annotations among 24 functional annotations for the 15 traits than s-LDSC (118 vs. 51). The second software tool, termed as best subset selection using GWAS summary statistics (BSsum), employs $L_0$ norm-based penalized regression methods to estimate PGS. Through simulation studies under diverse scenarios, we demonstrate that under high-sparsity, low-polygenicity scenarios, the $L_0$ norm holds an edge over the $L_1$ norm. These groundbreaking statistical tools hold the promise to further refine the extraction of valuable insights from GWAS data, thus driving genetic research to unprecedented heights.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshPersonality - Genetic aspects-
dc.titleExploring statistical methods for estimating heritability, functional enrichment and polygenic risk score using GWAS summary data in complex traits-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplinePsychiatry-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991045004195803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats