Computational and statistical methods for bulk and single cell RNA sequencing data

Zhao, Jiaying; 赵家英

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Mathematics: Theses

postgraduate thesis: Computational and statistical methods for bulk and single cell RNA sequencing data

Title	Computational and statistical methods for bulk and single cell RNA sequencing data
Authors	Zhao, Jiaying 赵家英
Advisors	Advisor(s):Ching, WK
Issue Date	2025
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhao, J. [赵家英]. (2025). Computational and statistical methods for bulk and single cell RNA sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Over the past decade, developments in next-generation sequencing technolo- gies have provided unprecedented opportunities for characterizing human tran- scriptome landscapes across extensive experimental conditions, temporal stages, and tissue types. Bringing significant insights in gene expression dynamics, the generated transcriptomic data such as bulk and single cell RNA-seq data is promising to advance our understanding on how genetic products are regulated at the molecular level. Although efforts have been made to solve key analytical tasks, such as normalization, differential expression analysis, cell type clustering, and gene network inference, significant challenges remain unresolved and therefore hinder the way of uncovering the underlying biological mechanisms that generate the observed transcriptional response. A first step toward improving gene ex- pression analysis is to remove technical variations before conducting downstream practical pipelines. When imputing unwanted dropout events, two concerns on the massive scales of the observation as well as the interconnected nature of genes and cells need to be addressed simultaneously in one comprehensive frame- work. To do so, we proposed a non-negative matrix factorization framework that jointly incorporates gene and cell correlations for single cell gene expres- sion imputation, and employed distributed Stochastic Gradient Descent (SGD) method for large-scale optimization. In addition to treating imputation as a standalone pre-processing step, we have developed a unified framework that inte- gratively addresses technical variations during gene regulatory network inference based on structural equation model. As higher order features derived through computational inference rather than direct observation, the reconstructed gene regulatory relationships can contribute to unveiling sophisticated mechanisms of transcriptional regulation. Moving beyond static snapshots, we are interested in investigating the intrinsic feature of elasticity from temporal gene expression measurements during the perturbation response process, where we aim to give a rigorous definition to quantify the subject’s capability of returning to baseline af- ter disturbance. To accommodate covariates and identify factors associated with elasticity variations, we use generalized estimating equations to derive consistent estimators and adopt an augmentation approach to improve the efficiency of the proposed estimators. The framework for elasticity inference is now applicable to bulk RNA-seq data. Based on current trends, we expect that the rapidly develop- ing high-throughput techniques will yield ever-increasing cell-level transcriptional profiles over broader timescales following perturbations in the foreseeable future, and the framework can be extended to accommodate the single cell scenario then.
Degree	Doctor of Philosophy
Subject	Nucleotide sequence - Mathematical models Nucleotide sequence - Statistical methods
Dept/Program	Mathematics
Persistent Identifier	http://hdl.handle.net/10722/367454

DC Field	Value	Language
dc.contributor.advisor	Ching, WK	-
dc.contributor.author	Zhao, Jiaying	-
dc.contributor.author	赵家英	-
dc.date.accessioned	2025-12-11T06:42:13Z	-
dc.date.available	2025-12-11T06:42:13Z	-
dc.date.issued	2025	-
dc.identifier.citation	Zhao, J. [赵家英]. (2025). Computational and statistical methods for bulk and single cell RNA sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/367454	-
dc.description.abstract	Over the past decade, developments in next-generation sequencing technolo- gies have provided unprecedented opportunities for characterizing human tran- scriptome landscapes across extensive experimental conditions, temporal stages, and tissue types. Bringing significant insights in gene expression dynamics, the generated transcriptomic data such as bulk and single cell RNA-seq data is promising to advance our understanding on how genetic products are regulated at the molecular level. Although efforts have been made to solve key analytical tasks, such as normalization, differential expression analysis, cell type clustering, and gene network inference, significant challenges remain unresolved and therefore hinder the way of uncovering the underlying biological mechanisms that generate the observed transcriptional response. A first step toward improving gene ex- pression analysis is to remove technical variations before conducting downstream practical pipelines. When imputing unwanted dropout events, two concerns on the massive scales of the observation as well as the interconnected nature of genes and cells need to be addressed simultaneously in one comprehensive frame- work. To do so, we proposed a non-negative matrix factorization framework that jointly incorporates gene and cell correlations for single cell gene expres- sion imputation, and employed distributed Stochastic Gradient Descent (SGD) method for large-scale optimization. In addition to treating imputation as a standalone pre-processing step, we have developed a unified framework that inte- gratively addresses technical variations during gene regulatory network inference based on structural equation model. As higher order features derived through computational inference rather than direct observation, the reconstructed gene regulatory relationships can contribute to unveiling sophisticated mechanisms of transcriptional regulation. Moving beyond static snapshots, we are interested in investigating the intrinsic feature of elasticity from temporal gene expression measurements during the perturbation response process, where we aim to give a rigorous definition to quantify the subject’s capability of returning to baseline af- ter disturbance. To accommodate covariates and identify factors associated with elasticity variations, we use generalized estimating equations to derive consistent estimators and adopt an augmentation approach to improve the efficiency of the proposed estimators. The framework for elasticity inference is now applicable to bulk RNA-seq data. Based on current trends, we expect that the rapidly develop- ing high-throughput techniques will yield ever-increasing cell-level transcriptional profiles over broader timescales following perturbations in the foreseeable future, and the framework can be extended to accommodate the single cell scenario then.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Nucleotide sequence - Mathematical models	-
dc.subject.lcsh	Nucleotide sequence - Statistical methods	-
dc.title	Computational and statistical methods for bulk and single cell RNA sequencing data	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Mathematics	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2025	-
dc.identifier.mmsid	991045147148703414	-

File Download

Supplementary

postgraduate thesis: Computational and statistical methods for bulk and single cell RNA sequencing data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats