File Download
Supplementary

postgraduate thesis: A pipeline for identification of NBS-encoding resistance genes

TitleA pipeline for identification of NBS-encoding resistance genes
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wu, H. [吴海阳]. (2015). A pipeline for identification of NBS-encoding resistance genes. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760977
AbstractDisease resistance genes (R-Genes) play the main role in immune respond of plants. Now, the exploration of genomic data of plants was accelerated by the next generation sequencing technology, which provides an opportunity of identifying and analyzing R-Genes at genome-wide scale. Discovery of the structure of R-Genes and its loci provides insight into function and evolution of this gene family, and should lead to novel strategies for disease control, especially for breeding research of crops. Here, we developed a four-part pipeline to enable genome-wide identification and analysis of NBS-encoding R-Genes. Searching from peptide date was based on a novel reiterative workflow to build a stable species-specific Hidden Markov model (ssHMM), which significantly increased sensitivity. Meanwhile the validating step had been served as main role of ensuring the specificity. To facilitate biological researches of R-Genes, beside identification of protein data, the identification of homologies of those genes, including pseudogenes and homologous sequences contained putative mutations, had been wrapped up in our pipeline. Additionally, the function of motif classification was also provided in this pipeline, which was informative to help research evaluate the result at first hand. For the evaluation of performance, over 90% of expressed R-Genes had been detected to have the functional motifs related to disease resistance genes in the experiments of both Arabidopsis thaliana and Oryza sativa. Especially in the experiment of Arabidopsis thaliana, only one peptide sequence had been missed in the result of motif classification and over 90% of identified proteins had been confirmed with public annotation. Furthermore, compared with previous studies, a highly similar characteristics of R-Genes distribution had also been carried out with our result, which is the pattern of significant numbers of R-Gene clusters. In our experiments, the pipeline was capable to automatically identifying R-Genes by only using public data peptides data and genome sequences. Moreover, the main function of pipeline was constructed to run locally and with multiple threads. The whole process could be done in hours with most UNIX-like systems, which makes particularly useful for researches of R-Genes, especially for researches of R-Genes by comparing them among several plants.
DegreeMaster of Philosophy
SubjectMarkov processes
Plants - Disease and pest resistance - Genetic aspects
Plant diseases - Genetic aspects
Computational biology
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/226823

 

DC FieldValueLanguage
dc.contributor.authorWu, Haiyang-
dc.contributor.author吴海阳-
dc.date.accessioned2016-07-05T23:16:51Z-
dc.date.available2016-07-05T23:16:51Z-
dc.date.issued2015-
dc.identifier.citationWu, H. [吴海阳]. (2015). A pipeline for identification of NBS-encoding resistance genes. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760977-
dc.identifier.urihttp://hdl.handle.net/10722/226823-
dc.description.abstractDisease resistance genes (R-Genes) play the main role in immune respond of plants. Now, the exploration of genomic data of plants was accelerated by the next generation sequencing technology, which provides an opportunity of identifying and analyzing R-Genes at genome-wide scale. Discovery of the structure of R-Genes and its loci provides insight into function and evolution of this gene family, and should lead to novel strategies for disease control, especially for breeding research of crops. Here, we developed a four-part pipeline to enable genome-wide identification and analysis of NBS-encoding R-Genes. Searching from peptide date was based on a novel reiterative workflow to build a stable species-specific Hidden Markov model (ssHMM), which significantly increased sensitivity. Meanwhile the validating step had been served as main role of ensuring the specificity. To facilitate biological researches of R-Genes, beside identification of protein data, the identification of homologies of those genes, including pseudogenes and homologous sequences contained putative mutations, had been wrapped up in our pipeline. Additionally, the function of motif classification was also provided in this pipeline, which was informative to help research evaluate the result at first hand. For the evaluation of performance, over 90% of expressed R-Genes had been detected to have the functional motifs related to disease resistance genes in the experiments of both Arabidopsis thaliana and Oryza sativa. Especially in the experiment of Arabidopsis thaliana, only one peptide sequence had been missed in the result of motif classification and over 90% of identified proteins had been confirmed with public annotation. Furthermore, compared with previous studies, a highly similar characteristics of R-Genes distribution had also been carried out with our result, which is the pattern of significant numbers of R-Gene clusters. In our experiments, the pipeline was capable to automatically identifying R-Genes by only using public data peptides data and genome sequences. Moreover, the main function of pipeline was constructed to run locally and with multiple threads. The whole process could be done in hours with most UNIX-like systems, which makes particularly useful for researches of R-Genes, especially for researches of R-Genes by comparing them among several plants.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshMarkov processes-
dc.subject.lcshPlants - Disease and pest resistance - Genetic aspects-
dc.subject.lcshPlant diseases - Genetic aspects-
dc.subject.lcshComputational biology-
dc.titleA pipeline for identification of NBS-encoding resistance genes-
dc.typePG_Thesis-
dc.identifier.hkulb5760977-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats