File Download
Supplementary

postgraduate thesis: Statistical methods for incorporating external information into current study

TitleStatistical methods for incorporating external information into current study
Authors
Advisors
Advisor(s):Zhang, YSham, PC
Issue Date2025
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Lai, D. [赖道远]. (2025). Statistical methods for incorporating external information into current study. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis thesis explores advanced methodologies in data integration, focusing on transfer learning techniques to enhance prediction and inference in genetic and biomedical studies. The first part focuses on improving Transcriptome-wide Association Studies (TWAS), while the second introduces a Bayesian transfer learning framework for high-dimensional regression. Both approaches address the challenge of leveraging external data to strengthen inference in target domains with limited samples. The first part of the thesis proposes a transfer learning method that can enhance the power of TWAS. TWAS utilize gene expression data to explore the genetic basis of complex traits. A key challenge in TWAS is developing robust imputation models for tissues with limited sample sizes. This paper introduces TRANSfer Learning-assisted TWAS (TransferTWAS), a novel framework that adaptively transfers information from multiple tissues to improve gene expression prediction in the target tissue. TransferTWAS adopts a data-driven strategy that assigns higher weights to genetically similar external tissues. Remarkably, it surpasses other multi-tissue TWAS methods, such as the Unified Test for Molecular Signatures (UTMOST), which neglects tissue similarity, and Joint-tissue Imputation (JTI), which relies on functional annotations to represent tissue similarity. Simulation studies demonstrate that TransferTWAS achieves the highest imputation accuracy, and analyses using the ROSMAP and GEUVADIS datasets show substantial power gains while maintaining control over type-I errors. Furthermore, analysis of the low-density lipoprotein cholesterol GWAS dataset and other complex traits demonstrates that TransferTWAS effectively identifies more associations, retrieves known genes, and uncovers novel associations. The second part of the thesis proposes a novel Bayesian transfer learning method called TRansfer leArning via guideD horseshoE prioR (TRADER). Transfer learning enhances model performance in a target population with limited samples by leveraging knowledge from related studies. While many works focus on improving predictive performance, challenges of statistical inference persist. Bayesian approaches naturally offer uncertainty quantification for parameter estimates, yet existing Bayesian transfer learning methods are typically limited to single-source scenarios or require individual-level data. We introduce TRADER, a novel approach enabling multi-source transfer through pre-trained models in high-dimensional linear regression. TRADER shrinks target parameters toward an adaptively weighted average of source estimates, while remaining robust to differences in source scale and informativeness. Theoretical investigation shows that TRADER achieves faster posterior contraction rates than standard continuous shrinkage priors when sources are well-aligned with the target, while preventing negative transfer from heterogeneous sources. We also establish the finite-sample marginal posterior behavior of TRADER. Extensive simulations show that TRADER achieves inference performance no worse than using target data alone, performs comparably to its frequentist counterpart despite using only summary-level source data, and offers substantial computational advantages. Application to a high-dimensional genetic dataset further shows TRADER's effectiveness in inference under strong multicollinearity, outperforming existing frequentist methods.
DegreeDoctor of Philosophy
SubjectTransfer learning (Machine learning)
Bayesian statistical decision theory
Regression analysis
Genomics - Data processing
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/367483

 

DC FieldValueLanguage
dc.contributor.advisorZhang, Y-
dc.contributor.advisorSham, PC-
dc.contributor.authorLai, Daoyuan-
dc.contributor.author赖道远-
dc.date.accessioned2025-12-11T06:42:24Z-
dc.date.available2025-12-11T06:42:24Z-
dc.date.issued2025-
dc.identifier.citationLai, D. [赖道远]. (2025). Statistical methods for incorporating external information into current study. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/367483-
dc.description.abstractThis thesis explores advanced methodologies in data integration, focusing on transfer learning techniques to enhance prediction and inference in genetic and biomedical studies. The first part focuses on improving Transcriptome-wide Association Studies (TWAS), while the second introduces a Bayesian transfer learning framework for high-dimensional regression. Both approaches address the challenge of leveraging external data to strengthen inference in target domains with limited samples. The first part of the thesis proposes a transfer learning method that can enhance the power of TWAS. TWAS utilize gene expression data to explore the genetic basis of complex traits. A key challenge in TWAS is developing robust imputation models for tissues with limited sample sizes. This paper introduces TRANSfer Learning-assisted TWAS (TransferTWAS), a novel framework that adaptively transfers information from multiple tissues to improve gene expression prediction in the target tissue. TransferTWAS adopts a data-driven strategy that assigns higher weights to genetically similar external tissues. Remarkably, it surpasses other multi-tissue TWAS methods, such as the Unified Test for Molecular Signatures (UTMOST), which neglects tissue similarity, and Joint-tissue Imputation (JTI), which relies on functional annotations to represent tissue similarity. Simulation studies demonstrate that TransferTWAS achieves the highest imputation accuracy, and analyses using the ROSMAP and GEUVADIS datasets show substantial power gains while maintaining control over type-I errors. Furthermore, analysis of the low-density lipoprotein cholesterol GWAS dataset and other complex traits demonstrates that TransferTWAS effectively identifies more associations, retrieves known genes, and uncovers novel associations. The second part of the thesis proposes a novel Bayesian transfer learning method called TRansfer leArning via guideD horseshoE prioR (TRADER). Transfer learning enhances model performance in a target population with limited samples by leveraging knowledge from related studies. While many works focus on improving predictive performance, challenges of statistical inference persist. Bayesian approaches naturally offer uncertainty quantification for parameter estimates, yet existing Bayesian transfer learning methods are typically limited to single-source scenarios or require individual-level data. We introduce TRADER, a novel approach enabling multi-source transfer through pre-trained models in high-dimensional linear regression. TRADER shrinks target parameters toward an adaptively weighted average of source estimates, while remaining robust to differences in source scale and informativeness. Theoretical investigation shows that TRADER achieves faster posterior contraction rates than standard continuous shrinkage priors when sources are well-aligned with the target, while preventing negative transfer from heterogeneous sources. We also establish the finite-sample marginal posterior behavior of TRADER. Extensive simulations show that TRADER achieves inference performance no worse than using target data alone, performs comparably to its frequentist counterpart despite using only summary-level source data, and offers substantial computational advantages. Application to a high-dimensional genetic dataset further shows TRADER's effectiveness in inference under strong multicollinearity, outperforming existing frequentist methods.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshTransfer learning (Machine learning)-
dc.subject.lcshBayesian statistical decision theory-
dc.subject.lcshRegression analysis-
dc.subject.lcshGenomics - Data processing-
dc.titleStatistical methods for incorporating external information into current study-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991045147148603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats