File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: A factor analysis approach to transcription regulatory network reconstruction using gene expression data

TitleA factor analysis approach to transcription regulatory network reconstruction using gene expression data
Authors
Advisors
Advisor(s):Hung, YSChang, C
Issue Date2012
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Chen, W. [陈玮]. (2012). A factor analysis approach to transcription regulatory network reconstruction using gene expression data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961778
AbstractReconstruction of Transcription Regulatory Network (TRN) and Transcription Factor Activity (TFA) from gene expression data is an important problem in systems biology. Currently, there exist various factor analysis methods for TRN reconstruction, but most approaches have specific assumptions not satisfied by real biological data. Network Component Analysis (NCA) can handle such limitations and is considered to be one of the most effective methods. The prerequisite for NCA is knowledge of the structure of TRN. Such structure can be obtained from ChIP-chip or ChIP-seq experiments, which however have quite limited applications. In order to cope with the difficulty, we resort to heuristic optimization algorithm such as Particle Swarm Optimization (PSO), in order to explore the possible structures of TRN and choose the most plausible one. Regarding the structure estimation problem, we extend classical PSO and propose a novel Probabilistic binary PSO. Furthermore, an improved NCA called FastNCA is adopted to compute the objective function accurately and fast, which enables PSO to run efficiently. Since heuristic optimization cannot guarantee global convergence, we run PSO multiple times and integrate the results. Then GCV-LASSO (Generalized Cross Validation - Least Absolute Shrinkage and Selection Operator) is performed to estimate TRN. We apply our approach and other factor analysis methods on the synthetic data. The results indicate that the proposed PSOFastNCA-GCV-LASSO algorithm gives better estimation. In order to incorporate more prior information on TRN structure and gene expression dynamics in the linear factor analysis model for improved estimation of TRN and TFAs, a linear Bayesian framework is adopted. Under the unified Bayesian framework, Bayesian Linear Sparse Factor Analysis Model (BLSFM) and Bayesian Linear State Space Model (BLSSM) are developed for instantaneous and dynamic TRN, respectively. Various approaches to incorporate partial and ambiguous prior network structure information in the Bayesian framework are proposed to improve performance in practical applications. Furthermore, we propose a novel mechanism for estimating the hyper-parameters of the distribution priors in our BLSFM and BLSSM models, which can significantly improve the estimation compared to traditional ways of hyper-parameter setting. With this development, reasonably good estimation of TFAs and TRN can be obtained even without use of any structure prior of TRN. Extensive numerical experiments are performed to investigate our developed methods under various settings, with comparison to some existing alternative approaches. It is demonstrated that our hyper-parameter estimation method improves the estimation of TFA and TRN in most settings and has superior performance, and that structure priors in general leads to improved estimation performance. Regarding application to real biological data, we execute the PSO-FastNCAGCV-LASSO algorithm developed in the thesis using E. Coli microarray data and obtain sensible estimation of TFAs and TRN. We apply BLSFM without structure priors of TRN, BLSSM without structure priors as well as with partial structure priors to Yeast S. cerevisiae microarray data and obtain a reasonable estimation of TFAs and TRN.
DegreeDoctor of Philosophy
SubjectFactor analysis.
Genetic transcription - Regulation - Statistical methods.
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/180958
HKU Library Item IDb4961778

 

DC FieldValueLanguage
dc.contributor.advisorHung, YS-
dc.contributor.advisorChang, C-
dc.contributor.authorChen, Wei-
dc.contributor.author陈玮-
dc.date.accessioned2013-02-07T06:21:25Z-
dc.date.available2013-02-07T06:21:25Z-
dc.date.issued2012-
dc.identifier.citationChen, W. [陈玮]. (2012). A factor analysis approach to transcription regulatory network reconstruction using gene expression data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961778-
dc.identifier.urihttp://hdl.handle.net/10722/180958-
dc.description.abstractReconstruction of Transcription Regulatory Network (TRN) and Transcription Factor Activity (TFA) from gene expression data is an important problem in systems biology. Currently, there exist various factor analysis methods for TRN reconstruction, but most approaches have specific assumptions not satisfied by real biological data. Network Component Analysis (NCA) can handle such limitations and is considered to be one of the most effective methods. The prerequisite for NCA is knowledge of the structure of TRN. Such structure can be obtained from ChIP-chip or ChIP-seq experiments, which however have quite limited applications. In order to cope with the difficulty, we resort to heuristic optimization algorithm such as Particle Swarm Optimization (PSO), in order to explore the possible structures of TRN and choose the most plausible one. Regarding the structure estimation problem, we extend classical PSO and propose a novel Probabilistic binary PSO. Furthermore, an improved NCA called FastNCA is adopted to compute the objective function accurately and fast, which enables PSO to run efficiently. Since heuristic optimization cannot guarantee global convergence, we run PSO multiple times and integrate the results. Then GCV-LASSO (Generalized Cross Validation - Least Absolute Shrinkage and Selection Operator) is performed to estimate TRN. We apply our approach and other factor analysis methods on the synthetic data. The results indicate that the proposed PSOFastNCA-GCV-LASSO algorithm gives better estimation. In order to incorporate more prior information on TRN structure and gene expression dynamics in the linear factor analysis model for improved estimation of TRN and TFAs, a linear Bayesian framework is adopted. Under the unified Bayesian framework, Bayesian Linear Sparse Factor Analysis Model (BLSFM) and Bayesian Linear State Space Model (BLSSM) are developed for instantaneous and dynamic TRN, respectively. Various approaches to incorporate partial and ambiguous prior network structure information in the Bayesian framework are proposed to improve performance in practical applications. Furthermore, we propose a novel mechanism for estimating the hyper-parameters of the distribution priors in our BLSFM and BLSSM models, which can significantly improve the estimation compared to traditional ways of hyper-parameter setting. With this development, reasonably good estimation of TFAs and TRN can be obtained even without use of any structure prior of TRN. Extensive numerical experiments are performed to investigate our developed methods under various settings, with comparison to some existing alternative approaches. It is demonstrated that our hyper-parameter estimation method improves the estimation of TFA and TRN in most settings and has superior performance, and that structure priors in general leads to improved estimation performance. Regarding application to real biological data, we execute the PSO-FastNCAGCV-LASSO algorithm developed in the thesis using E. Coli microarray data and obtain sensible estimation of TFAs and TRN. We apply BLSFM without structure priors of TRN, BLSSM without structure priors as well as with partial structure priors to Yeast S. cerevisiae microarray data and obtain a reasonable estimation of TFAs and TRN.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.source.urihttp://hub.hku.hk/bib/B49617783-
dc.subject.lcshFactor analysis.-
dc.subject.lcshGenetic transcription - Regulation - Statistical methods.-
dc.titleA factor analysis approach to transcription regulatory network reconstruction using gene expression data-
dc.typePG_Thesis-
dc.identifier.hkulb4961778-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b4961778-
dc.date.hkucongregation2013-
dc.identifier.mmsid991034140159703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats