File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Computational approaches for protein functions and gene association networks

TitleComputational approaches for protein functions and gene association networks
Authors
Advisors
Issue Date2014
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Yalamanchili, H. K.. (2014). Computational approaches for protein functions and gene association networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5317040
AbstractEntire molecular biology revolves primarily around proteins and genes (DNA and RNA). They collaborate with each other facilitating various biomolecular systems. Thus, to comprehend any biological phenomenon from very basic cell division to most complex cancer, it is fundamental to decode the functional dynamics of proteins and genes. Recently, computational approaches are being widely used to supplement traditional experimental approaches. However, each automated approach has its own advantages and limitations. In this thesis, major shortcomings of existing computational approaches are identified and alternative fast yet precise methods are proposed. First, a strong need for reliable automated protein function prediction is identified. Almost half of protein functional interpretations are enigmatic. Lack of universal functional vocabulary further elevates the problem. NRProF, a novel neural response based method is proposed for protein functional annotation. Neural response algorithm simulates human brain in classifying images; the same is applied here for classifying proteins. Considering Gene Ontology (GO) hierarchical structure as background, NRProF classifies a protein of interest to a specific GO category and thus assigns the corresponding function. Having established reliable protein functional annotations, protein and gene collaborations are studied next. Interactions amongst transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental for gene regulation and are highly specific, even in evolution background. To explain this binding specificity a Co-Evo (co-evolutionary) relationship is hypothesized. Pearson correlation and Mutual Information (MI) metrics are used to validate the hypothesis. Residue level MI is used to infer specific binding residues of TFs and corresponding TFBSs, assisting a thorough understanding of gene regulatory mechanism and aid targeted gene therapies. After comprehending TF and TFBS associations, interplay between genes is abstracted as Gene Regulatory Networks. Several methods using expression correlations are proposed to infer gene networks. However, most of them ignore the embedded dynamic delay induced by complex molecular interactions and other riotous cellular mechanisms, involved in gene regulation. The delay is rather obvious in high frequency time series expression data. DDGni, a novel network inference strategy is proposed by adopting gapped smith-waterman algorithm. Gaps attune expression delays and local alignment unveils short regulatory windows, which traditional methods overlook. In addition to gene level expression data, recent studies demonstrated the merits of exon-level RNA-Seq data in profiling splice variants and constructing gene networks. However, the large number of exons versus small sample size limits their practical application. SpliceNet, a novel method based on Large Dimensional Trace is proposed to infer isoform specific co-expression networks from exon-level RNA-Seq data. It provides a more comprehensive picture to our understanding of complex diseases by inferring network rewiring between normal and diseased samples at isoform resolution. It can be applied to any exon level RNA-Seq data and exon array data. In summary, this thesis first identifies major shortcomings of existing computational approaches to functional association of proteins and genes, and develops several tools viz. NRProF, Co-Evo, DDGni and SpliceNet. Collectively, they offer a comprehensive picture of the biomolecular system under study.
DegreeDoctor of Philosophy
SubjectGene regulatory networks - Data processing
Proteins - Data processing
Dept/ProgramBiochemistry
Persistent Identifierhttp://hdl.handle.net/10722/206477

 

DC FieldValueLanguage
dc.contributor.advisorWang, JJ-
dc.contributor.advisorChin, FYL-
dc.contributor.authorYalamanchili, Hari Krishna-
dc.date.accessioned2014-10-31T23:15:59Z-
dc.date.available2014-10-31T23:15:59Z-
dc.date.issued2014-
dc.identifier.citationYalamanchili, H. K.. (2014). Computational approaches for protein functions and gene association networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5317040-
dc.identifier.urihttp://hdl.handle.net/10722/206477-
dc.description.abstractEntire molecular biology revolves primarily around proteins and genes (DNA and RNA). They collaborate with each other facilitating various biomolecular systems. Thus, to comprehend any biological phenomenon from very basic cell division to most complex cancer, it is fundamental to decode the functional dynamics of proteins and genes. Recently, computational approaches are being widely used to supplement traditional experimental approaches. However, each automated approach has its own advantages and limitations. In this thesis, major shortcomings of existing computational approaches are identified and alternative fast yet precise methods are proposed. First, a strong need for reliable automated protein function prediction is identified. Almost half of protein functional interpretations are enigmatic. Lack of universal functional vocabulary further elevates the problem. NRProF, a novel neural response based method is proposed for protein functional annotation. Neural response algorithm simulates human brain in classifying images; the same is applied here for classifying proteins. Considering Gene Ontology (GO) hierarchical structure as background, NRProF classifies a protein of interest to a specific GO category and thus assigns the corresponding function. Having established reliable protein functional annotations, protein and gene collaborations are studied next. Interactions amongst transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental for gene regulation and are highly specific, even in evolution background. To explain this binding specificity a Co-Evo (co-evolutionary) relationship is hypothesized. Pearson correlation and Mutual Information (MI) metrics are used to validate the hypothesis. Residue level MI is used to infer specific binding residues of TFs and corresponding TFBSs, assisting a thorough understanding of gene regulatory mechanism and aid targeted gene therapies. After comprehending TF and TFBS associations, interplay between genes is abstracted as Gene Regulatory Networks. Several methods using expression correlations are proposed to infer gene networks. However, most of them ignore the embedded dynamic delay induced by complex molecular interactions and other riotous cellular mechanisms, involved in gene regulation. The delay is rather obvious in high frequency time series expression data. DDGni, a novel network inference strategy is proposed by adopting gapped smith-waterman algorithm. Gaps attune expression delays and local alignment unveils short regulatory windows, which traditional methods overlook. In addition to gene level expression data, recent studies demonstrated the merits of exon-level RNA-Seq data in profiling splice variants and constructing gene networks. However, the large number of exons versus small sample size limits their practical application. SpliceNet, a novel method based on Large Dimensional Trace is proposed to infer isoform specific co-expression networks from exon-level RNA-Seq data. It provides a more comprehensive picture to our understanding of complex diseases by inferring network rewiring between normal and diseased samples at isoform resolution. It can be applied to any exon level RNA-Seq data and exon array data. In summary, this thesis first identifies major shortcomings of existing computational approaches to functional association of proteins and genes, and develops several tools viz. NRProF, Co-Evo, DDGni and SpliceNet. Collectively, they offer a comprehensive picture of the biomolecular system under study.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshGene regulatory networks - Data processing-
dc.subject.lcshProteins - Data processing-
dc.titleComputational approaches for protein functions and gene association networks-
dc.typePG_Thesis-
dc.identifier.hkulb5317040-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineBiochemistry-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5317040-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats