File Download
 
Links for fulltext
(May Require Subscription)
 
Supplementary

Article: Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
  • Basic View
  • Metadata View
  • XML View
TitleUnsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
 
AuthorsYang, B2 1
Peng, Y1
Leung, HC1
Yiu, SM1
Chen, JC1
Chin, FY1
 
Issue Date2010
 
PublisherBioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/
 
CitationBmc Bioinformatics, 2010, v. 11 SUPPL. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1186/1471-2105-11-S2-S5
 
AbstractBackground: With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as " binning" . Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.Results: In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets.Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%.Conclusions: We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/. © 2010 Yang and Chin; licensee BioMed Central Ltd.
 
ISSN1471-2105
2012 Impact Factor: 3.024
2012 SCImago Journal Rankings: 1.524
 
DOIhttp://dx.doi.org/10.1186/1471-2105-11-S2-S5
 
ISI Accession Number IDWOS:000276812300005
 
ReferencesReferences in Scopus
 
DC FieldValue
dc.contributor.authorYang, B
 
dc.contributor.authorPeng, Y
 
dc.contributor.authorLeung, HC
 
dc.contributor.authorYiu, SM
 
dc.contributor.authorChen, JC
 
dc.contributor.authorChin, FY
 
dc.date.accessioned2012-06-26T06:39:00Z
 
dc.date.available2012-06-26T06:39:00Z
 
dc.date.issued2010
 
dc.description.abstractBackground: With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as " binning" . Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.Results: In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets.Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%.Conclusions: We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/. © 2010 Yang and Chin; licensee BioMed Central Ltd.
 
dc.description.natureLink_to_subscribed_fulltext
 
dc.identifier.citationBmc Bioinformatics, 2010, v. 11 SUPPL. 2 [How to Cite?]
DOI: http://dx.doi.org/10.1186/1471-2105-11-S2-S5
 
dc.identifier.citeulike8210869
 
dc.identifier.doihttp://dx.doi.org/10.1186/1471-2105-11-S2-S5
 
dc.identifier.eissn1471-2105
 
dc.identifier.isiWOS:000276812300005
 
dc.identifier.issn1471-2105
2012 Impact Factor: 3.024
2012 SCImago Journal Rankings: 1.524
 
dc.identifier.issueSUPPL. 2
 
dc.identifier.pmid20406503
 
dc.identifier.scopuseid_2-s2.0-77952894198
 
dc.identifier.urihttp://hdl.handle.net/10722/152434
 
dc.identifier.volume11
 
dc.languageeng
 
dc.publisherBioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/
 
dc.publisher.placeUnited Kingdom
 
dc.relation.ispartofBMC Bioinformatics
 
dc.relation.referencesReferences in Scopus
 
dc.subject.meshAlgorithms
 
dc.subject.meshCluster Analysis
 
dc.subject.meshDna - Chemistry
 
dc.subject.meshData Mining - Methods
 
dc.subject.meshDatabases, Genetic
 
dc.subject.meshEnvironmental Microbiology
 
dc.subject.meshEscherichia Coli - Genetics
 
dc.subject.meshGenome, Bacterial - Genetics
 
dc.subject.meshLactobacillus - Genetics
 
dc.subject.meshMetagenomics - Methods
 
dc.subject.meshSequence Analysis, Dna - Methods
 
dc.titleUnsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
 
dc.typeArticle
 
<?xml encoding="utf-8" version="1.0"?>
<item><contributor.author>Yang, B</contributor.author>
<contributor.author>Peng, Y</contributor.author>
<contributor.author>Leung, HC</contributor.author>
<contributor.author>Yiu, SM</contributor.author>
<contributor.author>Chen, JC</contributor.author>
<contributor.author>Chin, FY</contributor.author>
<date.accessioned>2012-06-26T06:39:00Z</date.accessioned>
<date.available>2012-06-26T06:39:00Z</date.available>
<date.issued>2010</date.issued>
<identifier.citation>Bmc Bioinformatics, 2010, v. 11 SUPPL. 2</identifier.citation>
<identifier.issn>1471-2105</identifier.issn>
<identifier.uri>http://hdl.handle.net/10722/152434</identifier.uri>
<description.abstract>Background: With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as &quot; binning&quot; . Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases.Results: In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets.Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%.Conclusions: We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/. &#169; 2010 Yang and Chin; licensee BioMed Central Ltd.</description.abstract>
<language>eng</language>
<publisher>BioMed Central Ltd. The Journal&apos;s web site is located at http://www.biomedcentral.com/bmcbioinformatics/</publisher>
<relation.ispartof>BMC Bioinformatics</relation.ispartof>
<subject.mesh>Algorithms</subject.mesh>
<subject.mesh>Cluster Analysis</subject.mesh>
<subject.mesh>Dna - Chemistry</subject.mesh>
<subject.mesh>Data Mining - Methods</subject.mesh>
<subject.mesh>Databases, Genetic</subject.mesh>
<subject.mesh>Environmental Microbiology</subject.mesh>
<subject.mesh>Escherichia Coli - Genetics</subject.mesh>
<subject.mesh>Genome, Bacterial - Genetics</subject.mesh>
<subject.mesh>Lactobacillus - Genetics</subject.mesh>
<subject.mesh>Metagenomics - Methods</subject.mesh>
<subject.mesh>Sequence Analysis, Dna - Methods</subject.mesh>
<title>Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers</title>
<type>Article</type>
<description.nature>Link_to_subscribed_fulltext</description.nature>
<identifier.doi>10.1186/1471-2105-11-S2-S5</identifier.doi>
<identifier.pmid>20406503</identifier.pmid>
<identifier.scopus>eid_2-s2.0-77952894198</identifier.scopus>
<relation.references>http://www.scopus.com/mlt/select.url?eid=2-s2.0-77952894198&amp;selection=ref&amp;src=s&amp;origin=recordpage</relation.references>
<identifier.volume>11</identifier.volume>
<identifier.issue>SUPPL. 2</identifier.issue>
<identifier.eissn>1471-2105</identifier.eissn>
<identifier.isi>WOS:000276812300005</identifier.isi>
<publisher.place>United Kingdom</publisher.place>
<identifier.citeulike>8210869</identifier.citeulike>
</item>
Author Affiliations
  1. The University of Hong Kong
  2. Southeast University