File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis

TitleINSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis
Authors
Issue Date14-Mar-2024
PublisherPublic Library of Science
Citation
PLoS Genetics, 2024, v. 20, n. 3 How to Cite?
Abstract

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared lowrank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of < = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing adjusted expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.


Persistent Identifierhttp://hdl.handle.net/10722/345781
ISSN
2014 Impact Factor: 7.528
2023 SCImago Journal Rankings: 2.219

 

DC FieldValueLanguage
dc.contributor.authorZhao, Kai-
dc.contributor.authorHuang, Sen-
dc.contributor.authorLin, Cuichan-
dc.contributor.authorSham, Pak Chung-
dc.contributor.authorSo, Hon Cheong-
dc.contributor.authorLin, Zhixiang-
dc.date.accessioned2024-08-28T07:40:40Z-
dc.date.available2024-08-28T07:40:40Z-
dc.date.issued2024-03-14-
dc.identifier.citationPLoS Genetics, 2024, v. 20, n. 3-
dc.identifier.issn1553-7390-
dc.identifier.urihttp://hdl.handle.net/10722/345781-
dc.description.abstract<p>RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared lowrank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of < = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing adjusted expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.</p>-
dc.languageeng-
dc.publisherPublic Library of Science-
dc.relation.ispartofPLoS Genetics-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleINSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis-
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1371/journal.pgen.1011189-
dc.identifier.pmid38484017-
dc.identifier.scopuseid_2-s2.0-85187664740-
dc.identifier.volume20-
dc.identifier.issue3-
dc.identifier.eissn1553-7404-
dc.identifier.issnl1553-7390-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats