File Download
 
 
Supplementary

Postgraduate Thesis: Strategy for prokaryotic genome sequencing
  • Basic View
  • Metadata View
  • XML View
TitleStrategy for prokaryotic genome sequencing
 
AuthorsJiang, Jingwei.
江经纬.
 
Issue Date2011
 
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
 
AbstractProkaryotes are single-cell microorganisms. These creatures can be further classified to bacteria and archaea. Their DNA genetic meterials are spread around the cytoplasm rather than residing in the nucleus. Unlike eukaryotes, a high percentage of prokaryotic genome is composed of genes. The evolution of prokaryotes is different from that of the eukaryotes. Prokaryotes are found almost everywhere including the harshest environments on Earth. Understanding the whole pictures of their genomes will benefit us a lot in terms of new enzyme discovery, decoding drug resistance, biofuel development, etc.  High-throughput sequencing technology is becoming increasingly popular in various sequencing projects. With different platforms, scientists are able to achieve millions ~ billions of sequences within days. In the last two years, there are a lot of prokaryotic genomes being sequenced under these platforms. However, there are only 1,542 complete chromosomes available in NCBI GenBank (September 2011) since the first complete genome of Bacillus subtilis was published in 1997. The most difficult step in finishing a complete genome is closing all gaps among different contigs. In this thesis, a series of comprehensive simulation studies based on 1,542 complete chromosomes have been performed in search of a cost-effective way to achieve complete prokaryotic genomes. Solutions to both draft and complete genome sequencing were provided by computer simulation. Moreover, classification studies have been performed to identify special prokaryotic phyla/orders (if any) dissatisfying our proposed strategies.  Our results indicate that: 1) low coverage (6x-10x) pyrosequencing with long reads (400 bp) is sufficient to produce highly continuous and complete assemblies, presenting a tiny proportion of false gene duplication/loss. High quality draft genomes could be generated by this strategy; 2) Long repeats to some extent influence the assembly quality, especially for the genome coverage and contig number. The number of contigs and genome coverage rate are significantly correlated with the total size of repeat regions; 3) With a combination of one run of single-end reads (10x, 400bp read length) and one run of paired end reads (10x, 8kb library, 400bp read length), ~90% of chromosome assemblies are less than 10 scaffolds and ~95% of chromosome assemblies are less than 150 contigs. Most of the chromosomes can be assembled into high quality draft chromosomes (<50 contigs, ~4 scaffolds, >370kb contig N50 size, >99.99% single base accuracy and <0.5% false gene duplication/loss rate in average); 4) Similar patterns found in both simulated and real reads imply that our simulation analysis is not overestimated; 5) Greater attention is needed regarding the orders Thiotrichales, Enterobacteriales and Nostocales, when applying the above strategies for complete genome sequencing; 6) For prokaryotic species with multiple chromosomes, Pulse Field Gel Electrophoresis is needed to separate all their chromosomes which will be individually collected by electroelution prior to draft/complete genome sequencing. A comprehensive computer simulation study based on 1,542 chromosomes (all availabe prokaryotic complete chromosomes, September 2011) has been performed in this thesis. The sequencing strategies for both prokaryotic draft and complete genome proposed by the simulation study could facilitate the ongoing prokaryotic complete genome sequencing projects.
 
AdvisorsLeung, FCC
 
DegreeDoctor of Philosophy
 
SubjectProkaryotes - Genetics.
Gene mapping.
 
Dept/ProgramBiological Sciences
 
DC FieldValue
dc.contributor.advisorLeung, FCC
 
dc.contributor.authorJiang, Jingwei.
 
dc.contributor.author江经纬.
 
dc.date.hkucongregation2012
 
dc.date.issued2011
 
dc.description.abstractProkaryotes are single-cell microorganisms. These creatures can be further classified to bacteria and archaea. Their DNA genetic meterials are spread around the cytoplasm rather than residing in the nucleus. Unlike eukaryotes, a high percentage of prokaryotic genome is composed of genes. The evolution of prokaryotes is different from that of the eukaryotes. Prokaryotes are found almost everywhere including the harshest environments on Earth. Understanding the whole pictures of their genomes will benefit us a lot in terms of new enzyme discovery, decoding drug resistance, biofuel development, etc.  High-throughput sequencing technology is becoming increasingly popular in various sequencing projects. With different platforms, scientists are able to achieve millions ~ billions of sequences within days. In the last two years, there are a lot of prokaryotic genomes being sequenced under these platforms. However, there are only 1,542 complete chromosomes available in NCBI GenBank (September 2011) since the first complete genome of Bacillus subtilis was published in 1997. The most difficult step in finishing a complete genome is closing all gaps among different contigs. In this thesis, a series of comprehensive simulation studies based on 1,542 complete chromosomes have been performed in search of a cost-effective way to achieve complete prokaryotic genomes. Solutions to both draft and complete genome sequencing were provided by computer simulation. Moreover, classification studies have been performed to identify special prokaryotic phyla/orders (if any) dissatisfying our proposed strategies.  Our results indicate that: 1) low coverage (6x-10x) pyrosequencing with long reads (400 bp) is sufficient to produce highly continuous and complete assemblies, presenting a tiny proportion of false gene duplication/loss. High quality draft genomes could be generated by this strategy; 2) Long repeats to some extent influence the assembly quality, especially for the genome coverage and contig number. The number of contigs and genome coverage rate are significantly correlated with the total size of repeat regions; 3) With a combination of one run of single-end reads (10x, 400bp read length) and one run of paired end reads (10x, 8kb library, 400bp read length), ~90% of chromosome assemblies are less than 10 scaffolds and ~95% of chromosome assemblies are less than 150 contigs. Most of the chromosomes can be assembled into high quality draft chromosomes (<50 contigs, ~4 scaffolds, >370kb contig N50 size, >99.99% single base accuracy and <0.5% false gene duplication/loss rate in average); 4) Similar patterns found in both simulated and real reads imply that our simulation analysis is not overestimated; 5) Greater attention is needed regarding the orders Thiotrichales, Enterobacteriales and Nostocales, when applying the above strategies for complete genome sequencing; 6) For prokaryotic species with multiple chromosomes, Pulse Field Gel Electrophoresis is needed to separate all their chromosomes which will be individually collected by electroelution prior to draft/complete genome sequencing. A comprehensive computer simulation study based on 1,542 chromosomes (all availabe prokaryotic complete chromosomes, September 2011) has been performed in this thesis. The sequencing strategies for both prokaryotic draft and complete genome proposed by the simulation study could facilitate the ongoing prokaryotic complete genome sequencing projects.
 
dc.description.naturepublished_or_final_version
 
dc.description.thesisdisciplineBiological Sciences
 
dc.description.thesisleveldoctoral
 
dc.description.thesisnameDoctor of Philosophy
 
dc.identifier.hkulb4786990
 
dc.languageeng
 
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)
 
dc.relation.ispartofHKU Theses Online (HKUTO)
 
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.
 
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License
 
dc.source.urihttp://hub.hku.hk/bib/B47869902
 
dc.subject.lcshProkaryotes - Genetics.
 
dc.subject.lcshGene mapping.
 
dc.titleStrategy for prokaryotic genome sequencing
 
dc.typePG_Thesis
 
<?xml encoding="utf-8" version="1.0"?>
<item><contributor.advisor>Leung, FCC</contributor.advisor>
<contributor.author>Jiang, Jingwei.</contributor.author>
<contributor.author>&#27743;&#32463;&#32428;.</contributor.author>
<date.issued>2011</date.issued>
<description.abstract>&#65279;Prokaryotes are single-cell microorganisms. These creatures can be further classified to bacteria and archaea. Their DNA genetic meterials are spread around the cytoplasm rather than residing in the nucleus. Unlike eukaryotes, a high percentage of prokaryotic genome is composed of genes. The evolution of prokaryotes is different from that of the eukaryotes. Prokaryotes are found almost everywhere including the harshest environments on Earth. Understanding the whole pictures of their genomes will benefit us a lot in terms of new enzyme discovery, decoding drug resistance, biofuel development, etc. 

&#12288;High-throughput sequencing technology is becoming increasingly popular in various sequencing projects. With different platforms, scientists are able to achieve millions ~ billions of sequences within days. In the last two years, there are a lot of prokaryotic genomes being sequenced under these platforms. However, there are only 1,542 complete chromosomes available in NCBI GenBank (September 2011) since the first complete genome of Bacillus subtilis was published in 1997. The most difficult step in finishing a complete genome is closing all gaps among different contigs. In this thesis, a series of comprehensive simulation studies based on 1,542 complete chromosomes have been performed in search of a cost-effective way to achieve complete prokaryotic genomes. Solutions to both draft and complete genome sequencing were provided by computer simulation. Moreover, classification studies have been performed to identify special prokaryotic phyla/orders (if any) dissatisfying our proposed strategies. 

&#12288;Our results indicate that: 1) low coverage (6x-10x) pyrosequencing with long reads (400 bp) is sufficient to produce highly continuous and complete assemblies, presenting a tiny proportion of false gene duplication/loss. High quality draft genomes could be generated by this strategy; 2) Long repeats to some extent influence the assembly quality, especially for the genome coverage and contig number. The number of contigs and genome coverage rate are significantly correlated with the total size of repeat regions; 3) With a combination of one run of single-end reads (10x, 400bp read length) and one run of paired end reads (10x, 8kb library, 400bp read length), ~90% of chromosome assemblies are less than 10 scaffolds and ~95% of chromosome assemblies are less than 150 contigs. Most of the chromosomes can be assembled into high quality draft chromosomes (&lt;50 contigs, ~4 scaffolds, &gt;370kb contig N50 size, &gt;99.99% single base accuracy and &lt;0.5% false gene duplication/loss rate in average); 4) Similar patterns found in both simulated and real reads imply that our simulation analysis is not overestimated; 5) Greater attention is needed regarding the orders Thiotrichales, Enterobacteriales and Nostocales, when applying the above strategies for complete genome sequencing; 6) For prokaryotic species with multiple chromosomes, Pulse Field Gel Electrophoresis is needed to separate all their chromosomes which will be individually collected by electroelution prior to draft/complete genome sequencing. 

A comprehensive computer simulation study based on 1,542 chromosomes (all availabe prokaryotic complete chromosomes, September 2011) has been performed in this thesis. The sequencing strategies for both prokaryotic draft and complete genome proposed by the simulation study could facilitate the ongoing prokaryotic complete genome sequencing projects.</description.abstract>
<language>eng</language>
<publisher>The University of Hong Kong (Pokfulam, Hong Kong)</publisher>
<relation.ispartof>HKU Theses Online (HKUTO)</relation.ispartof>
<rights>The author retains all proprietary rights, (such as patent rights) and the right to use in future works.</rights>
<rights>Creative Commons: Attribution 3.0 Hong Kong License</rights>
<source.uri>http://hub.hku.hk/bib/B47869902</source.uri>
<subject.lcsh>Prokaryotes - Genetics.</subject.lcsh>
<subject.lcsh>Gene mapping.</subject.lcsh>
<title>Strategy for prokaryotic genome sequencing</title>
<type>PG_Thesis</type>
<identifier.hkul>b4786990</identifier.hkul>
<description.thesisname>Doctor of Philosophy</description.thesisname>
<description.thesislevel>doctoral</description.thesislevel>
<description.thesisdiscipline>Biological Sciences</description.thesisdiscipline>
<description.nature>published_or_final_version</description.nature>
<date.hkucongregation>2012</date.hkucongregation>
<bitstream.url>http://hub.hku.hk/bitstream/10722/161550/1/FullText.pdf</bitstream.url>
</item>