File Download
Supplementary

postgraduate thesis: A Granger causality approach to gene regulatory network reconstructionbased on data from multiple experiments

TitleA Granger causality approach to gene regulatory network reconstructionbased on data from multiple experiments
Authors
Advisors
Advisor(s):Hung, YS
Issue Date2012
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Tam, H. [譚克奎]. (2012). A Granger causality approach to gene regulatory network reconstruction based on data from multiple experiments. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe discovery of gene regulatory network (GRN) using gene expression data is one of the promising directions for deciphering biological mechanisms, which underlie many basic aspects of scientific and medical advances. In this thesis, we focus on the reconstruction of GRN from time-series data using a Granger causality (GC) approach. As there is little existing research on combining data from multiple time-series experiments, we identify the need for developing a methodology with underlying theory to combine multiple experiments for statistical significant discovery. We derive a statistical theory for intersection of two discovered networks. Such a statistical framework is novel and intended for our GRN discovery problem. However, this theory is not limited to GRN or GC, and may be applied to other problems as long as one can take the intersection of discoveries obtained from multiple experiments (or datasets). We propose a number of novel methods for combining data from multiple experiments. Our single underlying model (SUM) method regresses data of multiple experiments in one go, enabling GC to fully utilize the information in the original data. Based on our statistical theory and SUM, we develop new meta-analysis methods, including union of pairwise common edges (UPCE) and leave-one-out hybrid of SUM and UPCE (LOOHSU). Applications on synthetic data and real data show that our new methods give discoveries of substantially higher precision than traditional meta-analysis. We also propose methods for estimating the precision of GC-discovered networks and thus fill in an important gap not considered in the literature. This allows us to assess how good a discovered network is in the case of unknown ground truth, which is typical in most biological applications. Our precision estimation by half-half splitting with combinations (HHSC) gives an estimate much closer to the true value compared with that computed from the Benjamini-Hochberg false discovery rate controlling procedure. Furthermore, using a network covering notion, we design a method that can identify a small number of links with high precision of around 0.8-0.9, which may relieve the burden of testing many hypothetical interactions of low precision in biological experiments. For the situation where the number of genes is much larger than the data length, in which case full-model GC cannot be applied, GC is often applied to the genes pairwisely. We analyze how spurious causalities (false discoveries) may arise. Consequently, we demonstrate that model validation can effectively remove spurious discoveries. With our proposed implementation that model orders are fixed by the Akaike information criterion and every model is subject to validation, we report a new observation that network hubs tend to act as sources rather than receivers of interactions.
DegreeDoctor of Philosophy
SubjectGene regulatory networks - Statistical methods.
Dept/ProgramElectrical and Electronic Engineering

 

DC FieldValueLanguage
dc.contributor.advisorHung, YS-
dc.contributor.authorTam, Hak-fui.-
dc.contributor.author譚克奎.-
dc.date.issued2012-
dc.identifier.citationTam, H. [譚克奎]. (2012). A Granger causality approach to gene regulatory network reconstruction based on data from multiple experiments. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.description.abstractThe discovery of gene regulatory network (GRN) using gene expression data is one of the promising directions for deciphering biological mechanisms, which underlie many basic aspects of scientific and medical advances. In this thesis, we focus on the reconstruction of GRN from time-series data using a Granger causality (GC) approach. As there is little existing research on combining data from multiple time-series experiments, we identify the need for developing a methodology with underlying theory to combine multiple experiments for statistical significant discovery. We derive a statistical theory for intersection of two discovered networks. Such a statistical framework is novel and intended for our GRN discovery problem. However, this theory is not limited to GRN or GC, and may be applied to other problems as long as one can take the intersection of discoveries obtained from multiple experiments (or datasets). We propose a number of novel methods for combining data from multiple experiments. Our single underlying model (SUM) method regresses data of multiple experiments in one go, enabling GC to fully utilize the information in the original data. Based on our statistical theory and SUM, we develop new meta-analysis methods, including union of pairwise common edges (UPCE) and leave-one-out hybrid of SUM and UPCE (LOOHSU). Applications on synthetic data and real data show that our new methods give discoveries of substantially higher precision than traditional meta-analysis. We also propose methods for estimating the precision of GC-discovered networks and thus fill in an important gap not considered in the literature. This allows us to assess how good a discovered network is in the case of unknown ground truth, which is typical in most biological applications. Our precision estimation by half-half splitting with combinations (HHSC) gives an estimate much closer to the true value compared with that computed from the Benjamini-Hochberg false discovery rate controlling procedure. Furthermore, using a network covering notion, we design a method that can identify a small number of links with high precision of around 0.8-0.9, which may relieve the burden of testing many hypothetical interactions of low precision in biological experiments. For the situation where the number of genes is much larger than the data length, in which case full-model GC cannot be applied, GC is often applied to the genes pairwisely. We analyze how spurious causalities (false discoveries) may arise. Consequently, we demonstrate that model validation can effectively remove spurious discoveries. With our proposed implementation that model orders are fixed by the Akaike information criterion and every model is subject to validation, we report a new observation that network hubs tend to act as sources rather than receivers of interactions.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.source.urihttp://hub.hku.hk/bib/B49764251-
dc.subject.lcshGene regulatory networks - Statistical methods.-
dc.titleA Granger causality approach to gene regulatory network reconstructionbased on data from multiple experiments-
dc.typePG_Thesis-
dc.identifier.hkulb4976425-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2013-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats