MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species

Wang, Y; Leung, HCM; Yiu, SM; Chin, FYL

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1089/cmb.2011.0276
Scopus: eid_2-s2.0-84863049441
PMID: 22300323
WOS: WOS:000300041600012
Find via

Supplementary

Bookmarks:
- CiteULike: 5
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species

Title	MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species
Authors	Wang, Y Leung, HCM Yiu, SM Chin, FYL
Keywords	Binning Environmental Genomics Metagenomics
Issue Date	2012
Publisher	Mary Ann Liebert, Inc Publishers. The Journal's web site is located at http://www.liebertpub.com/cmb
Citation	Journal Of Computational Biology, 2012, v. 19 n. 2, p. 241-249 How to Cite? DOI: http://dx.doi.org/10.1089/cmb.2011.0276
Abstract	Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. © 2012 Mary Ann Liebert, Inc.
Persistent Identifier	http://hdl.handle.net/10722/152031
ISSN	1066-5277 2023 Impact Factor: 1.4 2023 SCImago Journal Rankings: 0.659
ISI Accession Number ID	WOS:000300041600012
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Wang, Y	en_US
dc.contributor.author	Leung, HCM	en_US
dc.contributor.author	Yiu, SM	en_US
dc.contributor.author	Chin, FYL	en_US
dc.date.accessioned	2012-06-26T06:32:40Z	-
dc.date.available	2012-06-26T06:32:40Z	-
dc.date.issued	2012	en_US
dc.identifier.citation	Journal Of Computational Biology, 2012, v. 19 n. 2, p. 241-249	en_US
dc.identifier.issn	1066-5277	en_US
dc.identifier.uri	http://hdl.handle.net/10722/152031	-
dc.description.abstract	Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. © 2012 Mary Ann Liebert, Inc.	en_US
dc.language	eng	en_US
dc.publisher	Mary Ann Liebert, Inc Publishers. The Journal's web site is located at http://www.liebertpub.com/cmb	en_US
dc.relation.ispartof	Journal of Computational Biology	en_US
dc.rights	This is a copy of an article published in the Journal of Computational Biology © 2012 copyright Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://www.liebertonline.com.	-
dc.subject	Binning	en_US
dc.subject	Environmental Genomics	en_US
dc.subject	Metagenomics	en_US
dc.title	MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Leung, HCM:cmleung2@cs.hku.hk	en_US
dc.identifier.email	Yiu, SM:smyiu@cs.hku.hk	en_US
dc.identifier.email	Chin, FYL:chin@cs.hku.hk	en_US
dc.identifier.authority	Leung, HCM=rp00144	en_US
dc.identifier.authority	Yiu, SM=rp00207	en_US
dc.identifier.authority	Chin, FYL=rp00105	en_US
dc.description.nature	published_or_final_version	en_US
dc.identifier.doi	10.1089/cmb.2011.0276	en_US
dc.identifier.pmid	22300323	-
dc.identifier.scopus	eid_2-s2.0-84863049441	en_US
dc.identifier.hkuros	208232	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-84856752234&selection=ref&src=s&origin=recordpage	en_US
dc.identifier.volume	19	en_US
dc.identifier.issue	2	en_US
dc.identifier.spage	241	en_US
dc.identifier.epage	249	en_US
dc.identifier.isi	WOS:000300041600012	-
dc.publisher.place	United States	en_US
dc.identifier.scopusauthorid	Wang, Y=54961432200	en_US
dc.identifier.scopusauthorid	Leung, HCM=35233742700	en_US
dc.identifier.scopusauthorid	Yiu, SM=7003282240	en_US
dc.identifier.scopusauthorid	Chin, FYL=7005101915	en_US
dc.identifier.citeulike	10311018	-
dc.identifier.issnl	1066-5277	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats