File Download
Supplementary

postgraduate thesis: A study of clade-specific genomic elements in prokaryotes

TitleA study of clade-specific genomic elements in prokaryotes
Authors
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ho, C. [何子雋]. (2013). A study of clade-specific genomic elements in prokaryotes. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5558976
AbstractThe prokaryotic domains of Bacteria and Archaea are the most ubiquitous life forms on Earth and currently the most richly sampled species in terms of genome sequences. Understanding the genetic determinants which define this biodiversity can provide crucial insight on the underlying evolutionary processes. In the pre-genomic era, identification of clade-specific genomic regions relied on in vitro techniques limited by isolation difficulty, divergence and experimental systems of an organism. The advent of high-throughput DNA sequencing made publicly available vast genome data. It allowed complex homology search and subtraction be reproducibly performed; divergent homology be recognized; and non-cultivable organisms be studied. Based on the reasoning and assumption that evolution can lead to conserved, clade-restricted genomic characters in traditionally delimited monophyletic species-groups of medically important bacteria, it was hypothesized that the presence of such elements could be elucidated genomically using a computational approach. It was also hypothesized that the identified elements would corroborate existing classifications. Using 23, four and 16 genome sequences of the bioterrorism agent Burkholderia pseudomallei, its avirulent relative Burkholderia thailandensis and opportunistic pathogens Burkholderia cepacia complex, conserved, clade-specific gene targets were identified using a manual iterated BLAST approach. Identified targets were used in a multiplex PCR assay which correctly identified all 43 B. pseudomallei, seven B. thailandensis and 20 B. cepacia complex organisms tested. These together supported the hypotheses at the species and species-group levels. To further test the hypotheses below the species level, a semi-automated GUI software and an automated, webserver-based implementation, ssGeneFinder Webserver (http://147.8.74.24/ssGeneFinder/), were devised from the prototyped algorithm. Clade-specific genomic elements were annotated for an outbreak strain of Escherichia coli O104:H4 using nine genomes with the GUI software with validation using 96, 39 and 10 strains of E. coli, Salmonella and Shigella species and for the typhoid fever agent Salmonella enterica subspecies enterica serovar Typhi using 11 complete and draft genomes on the ssGeneFinder Webserver with validation in 40 S. Typhi, 110 non-Typhi Salmonella and 115 other Enterobacteriaceae isolates. The hypotheses were further supported at the serovar and strain levels. Based on the observation that different species are subject to niche-specific selective pressure which may affect their clade-specific genomic element abundance, the distribution of such was hypothesized to be non-random. An annotation pipeline, FindORFans, was designed by improving the ssGeneFinder algorithm with analysis of 240 ssGeneFinder Webserver usage statistic records. The large-scale annotation identified elements which were analysed in context of their genome size, GC content, lifestyle and medical importance. The total number and length of ORFans were found to be significantly correlated with proteome size (R = 0.696, 0.646; P < 0.001). The size-normalized ORFan abundance was negatively correlated with the genome G+C content (R = -0.245; P < 0.001). Obligate intracellular organisms had a higher genomic fraction of ORFans compared to facultative intracellular (P < 0.05) and extracellular organisms (P < 0.001). The results supported the niche-specific distribution of ORFans in prokaryotes. Reduced interspecific recombination and Darwinian selection were suggested as possible causes for the relative enrichment of ORFans in obligate intracellular prokaryotes.
DegreeDoctor of Philosophy
SubjectProkaryotes
Dept/ProgramMicrobiology
Persistent Identifierhttp://hdl.handle.net/10722/216269

 

DC FieldValueLanguage
dc.contributor.authorHo, Chi-chun-
dc.contributor.author何子雋-
dc.date.accessioned2015-09-08T23:11:35Z-
dc.date.available2015-09-08T23:11:35Z-
dc.date.issued2013-
dc.identifier.citationHo, C. [何子雋]. (2013). A study of clade-specific genomic elements in prokaryotes. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5558976-
dc.identifier.urihttp://hdl.handle.net/10722/216269-
dc.description.abstractThe prokaryotic domains of Bacteria and Archaea are the most ubiquitous life forms on Earth and currently the most richly sampled species in terms of genome sequences. Understanding the genetic determinants which define this biodiversity can provide crucial insight on the underlying evolutionary processes. In the pre-genomic era, identification of clade-specific genomic regions relied on in vitro techniques limited by isolation difficulty, divergence and experimental systems of an organism. The advent of high-throughput DNA sequencing made publicly available vast genome data. It allowed complex homology search and subtraction be reproducibly performed; divergent homology be recognized; and non-cultivable organisms be studied. Based on the reasoning and assumption that evolution can lead to conserved, clade-restricted genomic characters in traditionally delimited monophyletic species-groups of medically important bacteria, it was hypothesized that the presence of such elements could be elucidated genomically using a computational approach. It was also hypothesized that the identified elements would corroborate existing classifications. Using 23, four and 16 genome sequences of the bioterrorism agent Burkholderia pseudomallei, its avirulent relative Burkholderia thailandensis and opportunistic pathogens Burkholderia cepacia complex, conserved, clade-specific gene targets were identified using a manual iterated BLAST approach. Identified targets were used in a multiplex PCR assay which correctly identified all 43 B. pseudomallei, seven B. thailandensis and 20 B. cepacia complex organisms tested. These together supported the hypotheses at the species and species-group levels. To further test the hypotheses below the species level, a semi-automated GUI software and an automated, webserver-based implementation, ssGeneFinder Webserver (http://147.8.74.24/ssGeneFinder/), were devised from the prototyped algorithm. Clade-specific genomic elements were annotated for an outbreak strain of Escherichia coli O104:H4 using nine genomes with the GUI software with validation using 96, 39 and 10 strains of E. coli, Salmonella and Shigella species and for the typhoid fever agent Salmonella enterica subspecies enterica serovar Typhi using 11 complete and draft genomes on the ssGeneFinder Webserver with validation in 40 S. Typhi, 110 non-Typhi Salmonella and 115 other Enterobacteriaceae isolates. The hypotheses were further supported at the serovar and strain levels. Based on the observation that different species are subject to niche-specific selective pressure which may affect their clade-specific genomic element abundance, the distribution of such was hypothesized to be non-random. An annotation pipeline, FindORFans, was designed by improving the ssGeneFinder algorithm with analysis of 240 ssGeneFinder Webserver usage statistic records. The large-scale annotation identified elements which were analysed in context of their genome size, GC content, lifestyle and medical importance. The total number and length of ORFans were found to be significantly correlated with proteome size (R = 0.696, 0.646; P < 0.001). The size-normalized ORFan abundance was negatively correlated with the genome G+C content (R = -0.245; P < 0.001). Obligate intracellular organisms had a higher genomic fraction of ORFans compared to facultative intracellular (P < 0.05) and extracellular organisms (P < 0.001). The results supported the niche-specific distribution of ORFans in prokaryotes. Reduced interspecific recombination and Darwinian selection were suggested as possible causes for the relative enrichment of ORFans in obligate intracellular prokaryotes.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshProkaryotes-
dc.titleA study of clade-specific genomic elements in prokaryotes-
dc.typePG_Thesis-
dc.identifier.hkulb5558976-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineMicrobiology-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats