File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1371/journal.pgen.1011189
- Scopus: eid_2-s2.0-85187664740
- PMID: 38484017
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis
Title | INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis |
---|---|
Authors | |
Issue Date | 14-Mar-2024 |
Publisher | Public Library of Science |
Citation | PLoS Genetics, 2024, v. 20, n. 3 How to Cite? |
Abstract | RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared lowrank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of < = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing adjusted expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions. |
Persistent Identifier | http://hdl.handle.net/10722/345781 |
ISSN | 2014 Impact Factor: 7.528 2023 SCImago Journal Rankings: 2.219 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhao, Kai | - |
dc.contributor.author | Huang, Sen | - |
dc.contributor.author | Lin, Cuichan | - |
dc.contributor.author | Sham, Pak Chung | - |
dc.contributor.author | So, Hon Cheong | - |
dc.contributor.author | Lin, Zhixiang | - |
dc.date.accessioned | 2024-08-28T07:40:40Z | - |
dc.date.available | 2024-08-28T07:40:40Z | - |
dc.date.issued | 2024-03-14 | - |
dc.identifier.citation | PLoS Genetics, 2024, v. 20, n. 3 | - |
dc.identifier.issn | 1553-7390 | - |
dc.identifier.uri | http://hdl.handle.net/10722/345781 | - |
dc.description.abstract | <p>RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared lowrank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of < = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing adjusted expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.</p> | - |
dc.language | eng | - |
dc.publisher | Public Library of Science | - |
dc.relation.ispartof | PLoS Genetics | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.title | INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis | - |
dc.type | Article | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.1371/journal.pgen.1011189 | - |
dc.identifier.pmid | 38484017 | - |
dc.identifier.scopus | eid_2-s2.0-85187664740 | - |
dc.identifier.volume | 20 | - |
dc.identifier.issue | 3 | - |
dc.identifier.eissn | 1553-7404 | - |
dc.identifier.issnl | 1553-7390 | - |