File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Computational and statistical methods for bulk and single cell RNA sequencing data
| Title | Computational and statistical methods for bulk and single cell RNA sequencing data |
|---|---|
| Authors | |
| Advisors | Advisor(s):Ching, WK |
| Issue Date | 2025 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Zhao, J. [赵家英]. (2025). Computational and statistical methods for bulk and single cell RNA sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | Over the past decade, developments in next-generation sequencing technolo-
gies have provided unprecedented opportunities for characterizing human tran-
scriptome landscapes across extensive experimental conditions, temporal stages,
and tissue types. Bringing significant insights in gene expression dynamics,
the generated transcriptomic data such as bulk and single cell RNA-seq data
is promising to advance our understanding on how genetic products are regulated
at the molecular level. Although efforts have been made to solve key analytical
tasks, such as normalization, differential expression analysis, cell type clustering,
and gene network inference, significant challenges remain unresolved and therefore
hinder the way of uncovering the underlying biological mechanisms that generate
the observed transcriptional response. A first step toward improving gene ex-
pression analysis is to remove technical variations before conducting downstream
practical pipelines. When imputing unwanted dropout events, two concerns on
the massive scales of the observation as well as the interconnected nature of
genes and cells need to be addressed simultaneously in one comprehensive frame-
work. To do so, we proposed a non-negative matrix factorization framework
that jointly incorporates gene and cell correlations for single cell gene expres-
sion imputation, and employed distributed Stochastic Gradient Descent (SGD)
method for large-scale optimization. In addition to treating imputation as a standalone pre-processing step, we have developed a unified framework that inte-
gratively addresses technical variations during gene regulatory network inference
based on structural equation model. As higher order features derived through
computational inference rather than direct observation, the reconstructed gene
regulatory relationships can contribute to unveiling sophisticated mechanisms of
transcriptional regulation. Moving beyond static snapshots, we are interested
in investigating the intrinsic feature of elasticity from temporal gene expression
measurements during the perturbation response process, where we aim to give a
rigorous definition to quantify the subject’s capability of returning to baseline af-
ter disturbance. To accommodate covariates and identify factors associated with
elasticity variations, we use generalized estimating equations to derive consistent
estimators and adopt an augmentation approach to improve the efficiency of the
proposed estimators. The framework for elasticity inference is now applicable to
bulk RNA-seq data. Based on current trends, we expect that the rapidly develop-
ing high-throughput techniques will yield ever-increasing cell-level transcriptional
profiles over broader timescales following perturbations in the foreseeable future,
and the framework can be extended to accommodate the single cell scenario then. |
| Degree | Doctor of Philosophy |
| Subject | Nucleotide sequence - Mathematical models Nucleotide sequence - Statistical methods |
| Dept/Program | Mathematics |
| Persistent Identifier | http://hdl.handle.net/10722/367454 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Ching, WK | - |
| dc.contributor.author | Zhao, Jiaying | - |
| dc.contributor.author | 赵家英 | - |
| dc.date.accessioned | 2025-12-11T06:42:13Z | - |
| dc.date.available | 2025-12-11T06:42:13Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Zhao, J. [赵家英]. (2025). Computational and statistical methods for bulk and single cell RNA sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/367454 | - |
| dc.description.abstract | Over the past decade, developments in next-generation sequencing technolo- gies have provided unprecedented opportunities for characterizing human tran- scriptome landscapes across extensive experimental conditions, temporal stages, and tissue types. Bringing significant insights in gene expression dynamics, the generated transcriptomic data such as bulk and single cell RNA-seq data is promising to advance our understanding on how genetic products are regulated at the molecular level. Although efforts have been made to solve key analytical tasks, such as normalization, differential expression analysis, cell type clustering, and gene network inference, significant challenges remain unresolved and therefore hinder the way of uncovering the underlying biological mechanisms that generate the observed transcriptional response. A first step toward improving gene ex- pression analysis is to remove technical variations before conducting downstream practical pipelines. When imputing unwanted dropout events, two concerns on the massive scales of the observation as well as the interconnected nature of genes and cells need to be addressed simultaneously in one comprehensive frame- work. To do so, we proposed a non-negative matrix factorization framework that jointly incorporates gene and cell correlations for single cell gene expres- sion imputation, and employed distributed Stochastic Gradient Descent (SGD) method for large-scale optimization. In addition to treating imputation as a standalone pre-processing step, we have developed a unified framework that inte- gratively addresses technical variations during gene regulatory network inference based on structural equation model. As higher order features derived through computational inference rather than direct observation, the reconstructed gene regulatory relationships can contribute to unveiling sophisticated mechanisms of transcriptional regulation. Moving beyond static snapshots, we are interested in investigating the intrinsic feature of elasticity from temporal gene expression measurements during the perturbation response process, where we aim to give a rigorous definition to quantify the subject’s capability of returning to baseline af- ter disturbance. To accommodate covariates and identify factors associated with elasticity variations, we use generalized estimating equations to derive consistent estimators and adopt an augmentation approach to improve the efficiency of the proposed estimators. The framework for elasticity inference is now applicable to bulk RNA-seq data. Based on current trends, we expect that the rapidly develop- ing high-throughput techniques will yield ever-increasing cell-level transcriptional profiles over broader timescales following perturbations in the foreseeable future, and the framework can be extended to accommodate the single cell scenario then. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Nucleotide sequence - Mathematical models | - |
| dc.subject.lcsh | Nucleotide sequence - Statistical methods | - |
| dc.title | Computational and statistical methods for bulk and single cell RNA sequencing data | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Mathematics | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991045147148703414 | - |
