File Download
Supplementary

postgraduate thesis: Computational and statistical methods for bulk and single cell RNA sequencing data

TitleComputational and statistical methods for bulk and single cell RNA sequencing data
Authors
Advisors
Advisor(s):Ching, WK
Issue Date2025
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhao, J. [赵家英]. (2025). Computational and statistical methods for bulk and single cell RNA sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractOver the past decade, developments in next-generation sequencing technolo- gies have provided unprecedented opportunities for characterizing human tran- scriptome landscapes across extensive experimental conditions, temporal stages, and tissue types. Bringing significant insights in gene expression dynamics, the generated transcriptomic data such as bulk and single cell RNA-seq data is promising to advance our understanding on how genetic products are regulated at the molecular level. Although efforts have been made to solve key analytical tasks, such as normalization, differential expression analysis, cell type clustering, and gene network inference, significant challenges remain unresolved and therefore hinder the way of uncovering the underlying biological mechanisms that generate the observed transcriptional response. A first step toward improving gene ex- pression analysis is to remove technical variations before conducting downstream practical pipelines. When imputing unwanted dropout events, two concerns on the massive scales of the observation as well as the interconnected nature of genes and cells need to be addressed simultaneously in one comprehensive frame- work. To do so, we proposed a non-negative matrix factorization framework that jointly incorporates gene and cell correlations for single cell gene expres- sion imputation, and employed distributed Stochastic Gradient Descent (SGD) method for large-scale optimization. In addition to treating imputation as a standalone pre-processing step, we have developed a unified framework that inte- gratively addresses technical variations during gene regulatory network inference based on structural equation model. As higher order features derived through computational inference rather than direct observation, the reconstructed gene regulatory relationships can contribute to unveiling sophisticated mechanisms of transcriptional regulation. Moving beyond static snapshots, we are interested in investigating the intrinsic feature of elasticity from temporal gene expression measurements during the perturbation response process, where we aim to give a rigorous definition to quantify the subject’s capability of returning to baseline af- ter disturbance. To accommodate covariates and identify factors associated with elasticity variations, we use generalized estimating equations to derive consistent estimators and adopt an augmentation approach to improve the efficiency of the proposed estimators. The framework for elasticity inference is now applicable to bulk RNA-seq data. Based on current trends, we expect that the rapidly develop- ing high-throughput techniques will yield ever-increasing cell-level transcriptional profiles over broader timescales following perturbations in the foreseeable future, and the framework can be extended to accommodate the single cell scenario then.
DegreeDoctor of Philosophy
SubjectNucleotide sequence - Mathematical models
Nucleotide sequence - Statistical methods
Dept/ProgramMathematics
Persistent Identifierhttp://hdl.handle.net/10722/367454

 

DC FieldValueLanguage
dc.contributor.advisorChing, WK-
dc.contributor.authorZhao, Jiaying-
dc.contributor.author赵家英-
dc.date.accessioned2025-12-11T06:42:13Z-
dc.date.available2025-12-11T06:42:13Z-
dc.date.issued2025-
dc.identifier.citationZhao, J. [赵家英]. (2025). Computational and statistical methods for bulk and single cell RNA sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/367454-
dc.description.abstractOver the past decade, developments in next-generation sequencing technolo- gies have provided unprecedented opportunities for characterizing human tran- scriptome landscapes across extensive experimental conditions, temporal stages, and tissue types. Bringing significant insights in gene expression dynamics, the generated transcriptomic data such as bulk and single cell RNA-seq data is promising to advance our understanding on how genetic products are regulated at the molecular level. Although efforts have been made to solve key analytical tasks, such as normalization, differential expression analysis, cell type clustering, and gene network inference, significant challenges remain unresolved and therefore hinder the way of uncovering the underlying biological mechanisms that generate the observed transcriptional response. A first step toward improving gene ex- pression analysis is to remove technical variations before conducting downstream practical pipelines. When imputing unwanted dropout events, two concerns on the massive scales of the observation as well as the interconnected nature of genes and cells need to be addressed simultaneously in one comprehensive frame- work. To do so, we proposed a non-negative matrix factorization framework that jointly incorporates gene and cell correlations for single cell gene expres- sion imputation, and employed distributed Stochastic Gradient Descent (SGD) method for large-scale optimization. In addition to treating imputation as a standalone pre-processing step, we have developed a unified framework that inte- gratively addresses technical variations during gene regulatory network inference based on structural equation model. As higher order features derived through computational inference rather than direct observation, the reconstructed gene regulatory relationships can contribute to unveiling sophisticated mechanisms of transcriptional regulation. Moving beyond static snapshots, we are interested in investigating the intrinsic feature of elasticity from temporal gene expression measurements during the perturbation response process, where we aim to give a rigorous definition to quantify the subject’s capability of returning to baseline af- ter disturbance. To accommodate covariates and identify factors associated with elasticity variations, we use generalized estimating equations to derive consistent estimators and adopt an augmentation approach to improve the efficiency of the proposed estimators. The framework for elasticity inference is now applicable to bulk RNA-seq data. Based on current trends, we expect that the rapidly develop- ing high-throughput techniques will yield ever-increasing cell-level transcriptional profiles over broader timescales following perturbations in the foreseeable future, and the framework can be extended to accommodate the single cell scenario then.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNucleotide sequence - Mathematical models-
dc.subject.lcshNucleotide sequence - Statistical methods-
dc.titleComputational and statistical methods for bulk and single cell RNA sequencing data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineMathematics-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991045147148703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats