File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: A pipeline for identification of NBS-encoding resistance genes
Title | A pipeline for identification of NBS-encoding resistance genes |
---|---|
Authors | |
Issue Date | 2015 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wu, H. [吴海阳]. (2015). A pipeline for identification of NBS-encoding resistance genes. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760977 |
Abstract | Disease resistance genes (R-Genes) play the main role in immune respond of plants. Now, the exploration of genomic data of plants was accelerated by the next generation sequencing technology, which provides an opportunity of identifying and analyzing R-Genes at genome-wide scale. Discovery of the structure of R-Genes and its loci provides insight into function and evolution of this gene family, and should lead to novel strategies for disease control, especially for breeding research of crops.
Here, we developed a four-part pipeline to enable genome-wide identification and analysis of NBS-encoding R-Genes. Searching from peptide date was based on a novel reiterative workflow to build a stable species-specific Hidden Markov model (ssHMM), which significantly increased sensitivity. Meanwhile the validating step had been served as main role of ensuring the specificity. To facilitate biological researches of R-Genes, beside identification of protein data, the identification of homologies of those genes, including pseudogenes and homologous sequences contained putative mutations, had been wrapped up in our pipeline. Additionally, the function of motif classification was also provided in this pipeline, which was informative to help research evaluate the result at first hand.
For the evaluation of performance, over 90% of expressed
R-Genes had been detected to have the functional motifs related to disease resistance genes in the experiments of both Arabidopsis thaliana and Oryza sativa. Especially in the experiment of Arabidopsis thaliana, only one peptide sequence had been missed in the result of motif classification and over 90% of identified proteins had been confirmed with public annotation. Furthermore, compared with previous studies, a highly similar characteristics of R-Genes distribution had also been carried out with our result, which is the pattern of significant numbers of R-Gene clusters.
In our experiments, the pipeline was capable to automatically identifying R-Genes by only using public data peptides data and genome sequences. Moreover, the main function of pipeline was constructed to run locally and with multiple threads. The whole process could be done in hours with most UNIX-like systems, which makes particularly useful for researches of R-Genes, especially for researches of R-Genes by comparing them among several plants. |
Degree | Master of Philosophy |
Subject | Markov processes Plants - Disease and pest resistance - Genetic aspects Plant diseases - Genetic aspects Computational biology |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/226823 |
HKU Library Item ID | b5760977 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wu, Haiyang | - |
dc.contributor.author | 吴海阳 | - |
dc.date.accessioned | 2016-07-05T23:16:51Z | - |
dc.date.available | 2016-07-05T23:16:51Z | - |
dc.date.issued | 2015 | - |
dc.identifier.citation | Wu, H. [吴海阳]. (2015). A pipeline for identification of NBS-encoding resistance genes. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760977 | - |
dc.identifier.uri | http://hdl.handle.net/10722/226823 | - |
dc.description.abstract | Disease resistance genes (R-Genes) play the main role in immune respond of plants. Now, the exploration of genomic data of plants was accelerated by the next generation sequencing technology, which provides an opportunity of identifying and analyzing R-Genes at genome-wide scale. Discovery of the structure of R-Genes and its loci provides insight into function and evolution of this gene family, and should lead to novel strategies for disease control, especially for breeding research of crops. Here, we developed a four-part pipeline to enable genome-wide identification and analysis of NBS-encoding R-Genes. Searching from peptide date was based on a novel reiterative workflow to build a stable species-specific Hidden Markov model (ssHMM), which significantly increased sensitivity. Meanwhile the validating step had been served as main role of ensuring the specificity. To facilitate biological researches of R-Genes, beside identification of protein data, the identification of homologies of those genes, including pseudogenes and homologous sequences contained putative mutations, had been wrapped up in our pipeline. Additionally, the function of motif classification was also provided in this pipeline, which was informative to help research evaluate the result at first hand. For the evaluation of performance, over 90% of expressed R-Genes had been detected to have the functional motifs related to disease resistance genes in the experiments of both Arabidopsis thaliana and Oryza sativa. Especially in the experiment of Arabidopsis thaliana, only one peptide sequence had been missed in the result of motif classification and over 90% of identified proteins had been confirmed with public annotation. Furthermore, compared with previous studies, a highly similar characteristics of R-Genes distribution had also been carried out with our result, which is the pattern of significant numbers of R-Gene clusters. In our experiments, the pipeline was capable to automatically identifying R-Genes by only using public data peptides data and genome sequences. Moreover, the main function of pipeline was constructed to run locally and with multiple threads. The whole process could be done in hours with most UNIX-like systems, which makes particularly useful for researches of R-Genes, especially for researches of R-Genes by comparing them among several plants. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.subject.lcsh | Markov processes | - |
dc.subject.lcsh | Plants - Disease and pest resistance - Genetic aspects | - |
dc.subject.lcsh | Plant diseases - Genetic aspects | - |
dc.subject.lcsh | Computational biology | - |
dc.title | A pipeline for identification of NBS-encoding resistance genes | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5760977 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5760977 | - |
dc.identifier.mmsid | 991019899359703414 | - |