File Download
Supplementary

postgraduate thesis: Statistical and machine learning methods and algorithms for analyzing data from omics technologies

TitleStatistical and machine learning methods and algorithms for analyzing data from omics technologies
Authors
Advisors
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Yan, K. [嚴康]. (2020). Statistical and machine learning methods and algorithms for analyzing data from omics technologies. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractWe are now in an era of massive high-throughput omics data. Understanding the structure and characteristics underlying various omics data as well as choosing the appropriate analytical methods are crucial to the correct interpretation of the underlying biological and disease mechanisms. However, numerous human omics data studies use aging algorithms that do not fully utilize the potentials of omics data. Hence, there is a high demand for developing computational tools that can be applied to the storage, processing, analysis, and interpretation of various omics data. More specifically, it is imperative to develop well-designed computational, mathematical, statistical, and machine learning analytical approaches to enable improved analysis and to better interpret omics data for human studies. In envisaging such needs, this research focused on analytical algorithms for genetic and phenotypic trait association analysis with various omics data generated from different omics technologies. The recent developments in omics technologies have offered us an unprecedented opportunity to understand human health and complex diseases through the utilization of different omics features. Integrating diverse data sources can facilitate thorough and extensive analysis of complex phenotypic traits through discovering patterns that are evidently spotted across different experiments. Therefore, this thesis first provided a comprehensive comparison and evaluation of graph- and kernel-based omics integration classification algorithms by taking into account the various classification performance metrics as well as the computation time. The empirical evaluation on hypertension, breast and ovarian cancer data sets suggested that the better performers were composite association network, relevance vector machine and Ada-boost relevance vector machine. Biomedical imaging, as a powerful technique for visualization of biological activities and structures, is generally less invasive than some existing clinical examinations and inspection for the diagnosis and prognosis of diseases. Numerous radiomics features derived from biomedical imaging can be utilized for the determination of diseases and the prediction of therapeutic responses. A novel machine learning analytical framework that better utilize the high-dimension characteristic of radiomics features extracted from biomedical imaging for right-censored survival outcomes are presented accordingly. The expression quantitative trait loci (eQTL) analysis involves the discovery of genetic variants that reveal the role of genetic variants in regulating gene expression. This thesis presented a penalty-based multivariable regression model for the simultaneous discovery of multiple phenotypic trait-associated genetic variants while accounting for non-genetic and genetic confounding, and a Bayesian hierarchical framework. This framework utilizes the summary statistics to jointly identify the credible set of true eQTLs across multi-tissues with the modeling of linkage disequilibrium structure of genetic variants and corresponding epigenetic annotations. Experiments with simulated scenarios,imaging data from non-small cell lung cancer and head and neck cancer, and eQTL data retrieved from the Genotype-Tissue Expression (GTEx) consortium successfully demonstrated the improved performance of the three proposed algorithms over some existing methodologies. In general, the thesis contributes to the development and implementation of statistical and machine learning approaches for analyzing various omics data types with genetic, phenotypic, and survival traits.
DegreeDoctor of Philosophy
SubjectComputational biology
Genomics - Statistical methods
Meta-analysis
Dept/ProgramPublic Health
Persistent Identifierhttp://hdl.handle.net/10722/301048

 

DC FieldValueLanguage
dc.contributor.advisorPang, HMH-
dc.contributor.advisorLeung, GM-
dc.contributor.advisorWu, JTK-
dc.contributor.authorYan, Kang-
dc.contributor.author嚴康-
dc.date.accessioned2021-07-16T14:38:43Z-
dc.date.available2021-07-16T14:38:43Z-
dc.date.issued2020-
dc.identifier.citationYan, K. [嚴康]. (2020). Statistical and machine learning methods and algorithms for analyzing data from omics technologies. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/301048-
dc.description.abstractWe are now in an era of massive high-throughput omics data. Understanding the structure and characteristics underlying various omics data as well as choosing the appropriate analytical methods are crucial to the correct interpretation of the underlying biological and disease mechanisms. However, numerous human omics data studies use aging algorithms that do not fully utilize the potentials of omics data. Hence, there is a high demand for developing computational tools that can be applied to the storage, processing, analysis, and interpretation of various omics data. More specifically, it is imperative to develop well-designed computational, mathematical, statistical, and machine learning analytical approaches to enable improved analysis and to better interpret omics data for human studies. In envisaging such needs, this research focused on analytical algorithms for genetic and phenotypic trait association analysis with various omics data generated from different omics technologies. The recent developments in omics technologies have offered us an unprecedented opportunity to understand human health and complex diseases through the utilization of different omics features. Integrating diverse data sources can facilitate thorough and extensive analysis of complex phenotypic traits through discovering patterns that are evidently spotted across different experiments. Therefore, this thesis first provided a comprehensive comparison and evaluation of graph- and kernel-based omics integration classification algorithms by taking into account the various classification performance metrics as well as the computation time. The empirical evaluation on hypertension, breast and ovarian cancer data sets suggested that the better performers were composite association network, relevance vector machine and Ada-boost relevance vector machine. Biomedical imaging, as a powerful technique for visualization of biological activities and structures, is generally less invasive than some existing clinical examinations and inspection for the diagnosis and prognosis of diseases. Numerous radiomics features derived from biomedical imaging can be utilized for the determination of diseases and the prediction of therapeutic responses. A novel machine learning analytical framework that better utilize the high-dimension characteristic of radiomics features extracted from biomedical imaging for right-censored survival outcomes are presented accordingly. The expression quantitative trait loci (eQTL) analysis involves the discovery of genetic variants that reveal the role of genetic variants in regulating gene expression. This thesis presented a penalty-based multivariable regression model for the simultaneous discovery of multiple phenotypic trait-associated genetic variants while accounting for non-genetic and genetic confounding, and a Bayesian hierarchical framework. This framework utilizes the summary statistics to jointly identify the credible set of true eQTLs across multi-tissues with the modeling of linkage disequilibrium structure of genetic variants and corresponding epigenetic annotations. Experiments with simulated scenarios,imaging data from non-small cell lung cancer and head and neck cancer, and eQTL data retrieved from the Genotype-Tissue Expression (GTEx) consortium successfully demonstrated the improved performance of the three proposed algorithms over some existing methodologies. In general, the thesis contributes to the development and implementation of statistical and machine learning approaches for analyzing various omics data types with genetic, phenotypic, and survival traits.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshComputational biology-
dc.subject.lcshGenomics - Statistical methods-
dc.subject.lcshMeta-analysis-
dc.titleStatistical and machine learning methods and algorithms for analyzing data from omics technologies-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplinePublic Health-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044284192303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats