File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers

TitleUnsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
Authors
Issue Date2010
PublisherBioMed Central Ltd.
Citation
BMC Bioinformatics, 2010, v. 11 suppl 2, article S5, 11 pp. How to Cite?
Abstract
BACKGROUND: With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as 'binning'. Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases. RESULTS: In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%. CONCLUSIONS: We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/alse/MetaCluster/.
DescriptionProceedings : From 3rd International Workshop on Data and Text Mining in Bioinformatics (DTMBio) 2009, Hong Kong. 6 November 2009
Persistent Identifierhttp://hdl.handle.net/10722/152434
ISSN
PubMed Central ID
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorYang, Ben_US
dc.contributor.authorPeng, Yen_US
dc.contributor.authorLeung, HCMen_US
dc.contributor.authorYiu, SMen_US
dc.contributor.authorChen, JCen_US
dc.contributor.authorChin, FYLen_US
dc.date.accessioned2012-06-26T06:39:00Z-
dc.date.available2012-06-26T06:39:00Z-
dc.date.issued2010en_US
dc.identifier.citationBMC Bioinformatics, 2010, v. 11 suppl 2, article S5, 11 pp.en_US
dc.identifier.issn1471-2105 (online)en_US
dc.identifier.urihttp://hdl.handle.net/10722/152434-
dc.descriptionProceedings : From 3rd International Workshop on Data and Text Mining in Bioinformatics (DTMBio) 2009, Hong Kong. 6 November 2009-
dc.description.abstractBACKGROUND: With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as 'binning'. Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases. RESULTS: In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%. CONCLUSIONS: We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/alse/MetaCluster/.en_US
dc.languageengen_US
dc.publisherBioMed Central Ltd.en_US
dc.relation.ispartofBMC Bioinformaticsen_US
dc.rightsBMC Bioinformatics. Copyright © BioMed Central Ltd.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.meshAlgorithmsen_US
dc.subject.meshCluster Analysisen_US
dc.subject.meshDNA - chemistryen_US
dc.subject.meshData Mining - methodsen_US
dc.subject.meshDatabases, Geneticen_US
dc.subject.meshEnvironmental Microbiologyen_US
dc.subject.meshEscherichia coli - geneticsen_US
dc.subject.meshGenome, Bacterial - geneticsen_US
dc.subject.meshLactobacillus - geneticsen_US
dc.subject.meshMetagenomics - methodsen_US
dc.subject.meshSequence Analysis, DNA - methodsen_US
dc.titleUnsupervised binning of environmental genomic fragments based on an error robust selection of l-mersen_US
dc.typeArticleen_US
dc.identifier.emailLeung, HCM: cmleung2@cs.hku.hken_US
dc.identifier.emailYiu, SM: smyiu@cs.hku.hken_US
dc.identifier.emailChin, FYL: chin@cs.hku.hken_US
dc.identifier.authorityLeung, HCM=rp00144en_US
dc.identifier.authorityYiu, SM=rp00207en_US
dc.identifier.authorityChin, FYL=rp00105en_US
dc.description.naturepublished_or_final_versionen_US
dc.identifier.doi10.1186/1471-2105-11-S2-S5en_US
dc.identifier.pmid20406503en_US
dc.identifier.pmcidPMC3165929-
dc.identifier.scopuseid_2-s2.0-77952894198en_US
dc.identifier.hkuros177371-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-77952894198&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume11en_US
dc.identifier.issuesuppl 2en_US
dc.identifier.isiWOS:000276812300005-
dc.publisher.placeUnited Kingdomen_US
dc.identifier.scopusauthoridChin, FY=7005101915en_US
dc.identifier.scopusauthoridChen, JC=36439015600en_US
dc.identifier.scopusauthoridYiu, SM=7003282240en_US
dc.identifier.scopusauthoridLeung, HC=35233742700en_US
dc.identifier.scopusauthoridPeng, Y=8713314400en_US
dc.identifier.scopusauthoridYang, B=7404472246en_US
dc.identifier.citeulike8210869-
dc.customcontrol.immutablesml 140806-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats