File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Statistical and machine learning methods and algorithms for analyzing data from omics technologies
Title | Statistical and machine learning methods and algorithms for analyzing data from omics technologies |
---|---|
Authors | |
Advisors | |
Issue Date | 2020 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Yan, K. [嚴康]. (2020). Statistical and machine learning methods and algorithms for analyzing data from omics technologies. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | We are now in an era of massive high-throughput omics data. Understanding the structure and characteristics underlying various omics data as well as choosing the appropriate analytical methods are crucial to the correct interpretation of the underlying biological and disease mechanisms. However, numerous human omics data studies use aging algorithms that do not fully utilize the potentials of omics data. Hence, there is a high demand for developing computational tools that can be applied to the storage, processing, analysis, and interpretation of various omics data. More specifically, it is imperative to develop well-designed computational, mathematical, statistical, and machine learning analytical approaches to enable improved analysis and to better interpret omics data for human studies. In envisaging such needs, this research focused on analytical algorithms for genetic and phenotypic trait association analysis with various omics data generated from different omics technologies.
The recent developments in omics technologies have offered us an unprecedented opportunity to understand human health and complex diseases through the utilization of different omics features. Integrating diverse data sources can facilitate thorough and extensive analysis of complex phenotypic traits through discovering patterns that are evidently spotted across different experiments. Therefore, this thesis first provided a comprehensive comparison and evaluation of graph- and kernel-based omics integration classification algorithms by taking into account the various classification performance metrics as well as the computation time. The empirical evaluation on hypertension, breast and ovarian cancer data sets suggested that the better performers were composite association network, relevance vector machine and Ada-boost relevance vector machine.
Biomedical imaging, as a powerful technique for visualization of biological activities and structures, is generally less invasive than some existing clinical examinations and inspection for the diagnosis and prognosis of diseases. Numerous radiomics features derived from biomedical imaging can be utilized for the determination of diseases and the prediction of therapeutic responses. A novel machine learning analytical framework that better utilize the high-dimension characteristic of radiomics features extracted from biomedical imaging for right-censored survival outcomes are presented accordingly.
The expression quantitative trait loci (eQTL) analysis involves the discovery of genetic variants that reveal the role of genetic variants in regulating gene expression. This thesis presented a penalty-based multivariable regression model for the simultaneous discovery of multiple phenotypic trait-associated genetic variants while accounting for non-genetic and genetic confounding, and a Bayesian hierarchical framework. This framework utilizes the summary statistics to jointly identify the credible set of true eQTLs across multi-tissues with the modeling of linkage disequilibrium structure of genetic variants and corresponding epigenetic annotations.
Experiments with simulated scenarios,imaging data from non-small cell lung cancer and head and neck cancer, and eQTL data retrieved from the Genotype-Tissue Expression (GTEx) consortium successfully demonstrated the improved performance of the three proposed algorithms over some existing methodologies. In general, the thesis contributes to the development and implementation of statistical and machine learning approaches for analyzing various omics data types with genetic, phenotypic, and survival traits. |
Degree | Doctor of Philosophy |
Subject | Computational biology Genomics - Statistical methods Meta-analysis |
Dept/Program | Public Health |
Persistent Identifier | http://hdl.handle.net/10722/301048 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Pang, HMH | - |
dc.contributor.advisor | Leung, GM | - |
dc.contributor.advisor | Wu, JTK | - |
dc.contributor.author | Yan, Kang | - |
dc.contributor.author | 嚴康 | - |
dc.date.accessioned | 2021-07-16T14:38:43Z | - |
dc.date.available | 2021-07-16T14:38:43Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Yan, K. [嚴康]. (2020). Statistical and machine learning methods and algorithms for analyzing data from omics technologies. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/301048 | - |
dc.description.abstract | We are now in an era of massive high-throughput omics data. Understanding the structure and characteristics underlying various omics data as well as choosing the appropriate analytical methods are crucial to the correct interpretation of the underlying biological and disease mechanisms. However, numerous human omics data studies use aging algorithms that do not fully utilize the potentials of omics data. Hence, there is a high demand for developing computational tools that can be applied to the storage, processing, analysis, and interpretation of various omics data. More specifically, it is imperative to develop well-designed computational, mathematical, statistical, and machine learning analytical approaches to enable improved analysis and to better interpret omics data for human studies. In envisaging such needs, this research focused on analytical algorithms for genetic and phenotypic trait association analysis with various omics data generated from different omics technologies. The recent developments in omics technologies have offered us an unprecedented opportunity to understand human health and complex diseases through the utilization of different omics features. Integrating diverse data sources can facilitate thorough and extensive analysis of complex phenotypic traits through discovering patterns that are evidently spotted across different experiments. Therefore, this thesis first provided a comprehensive comparison and evaluation of graph- and kernel-based omics integration classification algorithms by taking into account the various classification performance metrics as well as the computation time. The empirical evaluation on hypertension, breast and ovarian cancer data sets suggested that the better performers were composite association network, relevance vector machine and Ada-boost relevance vector machine. Biomedical imaging, as a powerful technique for visualization of biological activities and structures, is generally less invasive than some existing clinical examinations and inspection for the diagnosis and prognosis of diseases. Numerous radiomics features derived from biomedical imaging can be utilized for the determination of diseases and the prediction of therapeutic responses. A novel machine learning analytical framework that better utilize the high-dimension characteristic of radiomics features extracted from biomedical imaging for right-censored survival outcomes are presented accordingly. The expression quantitative trait loci (eQTL) analysis involves the discovery of genetic variants that reveal the role of genetic variants in regulating gene expression. This thesis presented a penalty-based multivariable regression model for the simultaneous discovery of multiple phenotypic trait-associated genetic variants while accounting for non-genetic and genetic confounding, and a Bayesian hierarchical framework. This framework utilizes the summary statistics to jointly identify the credible set of true eQTLs across multi-tissues with the modeling of linkage disequilibrium structure of genetic variants and corresponding epigenetic annotations. Experiments with simulated scenarios,imaging data from non-small cell lung cancer and head and neck cancer, and eQTL data retrieved from the Genotype-Tissue Expression (GTEx) consortium successfully demonstrated the improved performance of the three proposed algorithms over some existing methodologies. In general, the thesis contributes to the development and implementation of statistical and machine learning approaches for analyzing various omics data types with genetic, phenotypic, and survival traits. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Computational biology | - |
dc.subject.lcsh | Genomics - Statistical methods | - |
dc.subject.lcsh | Meta-analysis | - |
dc.title | Statistical and machine learning methods and algorithms for analyzing data from omics technologies | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Public Health | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2020 | - |
dc.identifier.mmsid | 991044284192303414 | - |