File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species

TitleMetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species
Authors
KeywordsBinning
Environmental Genomics
Metagenomics
Issue Date2012
PublisherMary Ann Liebert, Inc Publishers. The Journal's web site is located at http://www.liebertpub.com/cmb
Citation
Journal Of Computational Biology, 2012, v. 19 n. 2, p. 241-249 How to Cite?
AbstractNext-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. © 2012 Mary Ann Liebert, Inc.
Persistent Identifierhttp://hdl.handle.net/10722/152031
ISSN
2023 Impact Factor: 1.4
2023 SCImago Journal Rankings: 0.659
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorWang, Yen_US
dc.contributor.authorLeung, HCMen_US
dc.contributor.authorYiu, SMen_US
dc.contributor.authorChin, FYLen_US
dc.date.accessioned2012-06-26T06:32:40Z-
dc.date.available2012-06-26T06:32:40Z-
dc.date.issued2012en_US
dc.identifier.citationJournal Of Computational Biology, 2012, v. 19 n. 2, p. 241-249en_US
dc.identifier.issn1066-5277en_US
dc.identifier.urihttp://hdl.handle.net/10722/152031-
dc.description.abstractNext-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. © 2012 Mary Ann Liebert, Inc.en_US
dc.languageengen_US
dc.publisherMary Ann Liebert, Inc Publishers. The Journal's web site is located at http://www.liebertpub.com/cmben_US
dc.relation.ispartofJournal of Computational Biologyen_US
dc.rightsThis is a copy of an article published in the Journal of Computational Biology © 2012 copyright Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://www.liebertonline.com.-
dc.subjectBinningen_US
dc.subjectEnvironmental Genomicsen_US
dc.subjectMetagenomicsen_US
dc.titleMetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of speciesen_US
dc.typeConference_Paperen_US
dc.identifier.emailLeung, HCM:cmleung2@cs.hku.hken_US
dc.identifier.emailYiu, SM:smyiu@cs.hku.hken_US
dc.identifier.emailChin, FYL:chin@cs.hku.hken_US
dc.identifier.authorityLeung, HCM=rp00144en_US
dc.identifier.authorityYiu, SM=rp00207en_US
dc.identifier.authorityChin, FYL=rp00105en_US
dc.description.naturepublished_or_final_versionen_US
dc.identifier.doi10.1089/cmb.2011.0276en_US
dc.identifier.pmid22300323-
dc.identifier.scopuseid_2-s2.0-84863049441en_US
dc.identifier.hkuros208232-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-84856752234&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume19en_US
dc.identifier.issue2en_US
dc.identifier.spage241en_US
dc.identifier.epage249en_US
dc.identifier.isiWOS:000300041600012-
dc.publisher.placeUnited Statesen_US
dc.identifier.scopusauthoridWang, Y=54961432200en_US
dc.identifier.scopusauthoridLeung, HCM=35233742700en_US
dc.identifier.scopusauthoridYiu, SM=7003282240en_US
dc.identifier.scopusauthoridChin, FYL=7005101915en_US
dc.identifier.citeulike10311018-
dc.identifier.issnl1066-5277-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats