File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1089/cmb.2011.0276
- Scopus: eid_2-s2.0-84863049441
- PMID: 22300323
- WOS: WOS:000300041600012
- Find via
Supplementary
-
Bookmarks:
- CiteULike: 5
- Citations:
- Appears in Collections:
Conference Paper: MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species
Title | MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species |
---|---|
Authors | |
Keywords | Binning Environmental Genomics Metagenomics |
Issue Date | 2012 |
Publisher | Mary Ann Liebert, Inc Publishers. The Journal's web site is located at http://www.liebertpub.com/cmb |
Citation | Journal Of Computational Biology, 2012, v. 19 n. 2, p. 241-249 How to Cite? |
Abstract | Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. © 2012 Mary Ann Liebert, Inc. |
Persistent Identifier | http://hdl.handle.net/10722/152031 |
ISSN | 2023 Impact Factor: 1.4 2023 SCImago Journal Rankings: 0.659 |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, Y | en_US |
dc.contributor.author | Leung, HCM | en_US |
dc.contributor.author | Yiu, SM | en_US |
dc.contributor.author | Chin, FYL | en_US |
dc.date.accessioned | 2012-06-26T06:32:40Z | - |
dc.date.available | 2012-06-26T06:32:40Z | - |
dc.date.issued | 2012 | en_US |
dc.identifier.citation | Journal Of Computational Biology, 2012, v. 19 n. 2, p. 241-249 | en_US |
dc.identifier.issn | 1066-5277 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/152031 | - |
dc.description.abstract | Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. © 2012 Mary Ann Liebert, Inc. | en_US |
dc.language | eng | en_US |
dc.publisher | Mary Ann Liebert, Inc Publishers. The Journal's web site is located at http://www.liebertpub.com/cmb | en_US |
dc.relation.ispartof | Journal of Computational Biology | en_US |
dc.rights | This is a copy of an article published in the Journal of Computational Biology © 2012 copyright Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://www.liebertonline.com. | - |
dc.subject | Binning | en_US |
dc.subject | Environmental Genomics | en_US |
dc.subject | Metagenomics | en_US |
dc.title | MetaCluster 4.0: A novel binning algorithm for NGS reads and huge number of species | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Leung, HCM:cmleung2@cs.hku.hk | en_US |
dc.identifier.email | Yiu, SM:smyiu@cs.hku.hk | en_US |
dc.identifier.email | Chin, FYL:chin@cs.hku.hk | en_US |
dc.identifier.authority | Leung, HCM=rp00144 | en_US |
dc.identifier.authority | Yiu, SM=rp00207 | en_US |
dc.identifier.authority | Chin, FYL=rp00105 | en_US |
dc.description.nature | published_or_final_version | en_US |
dc.identifier.doi | 10.1089/cmb.2011.0276 | en_US |
dc.identifier.pmid | 22300323 | - |
dc.identifier.scopus | eid_2-s2.0-84863049441 | en_US |
dc.identifier.hkuros | 208232 | - |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-84856752234&selection=ref&src=s&origin=recordpage | en_US |
dc.identifier.volume | 19 | en_US |
dc.identifier.issue | 2 | en_US |
dc.identifier.spage | 241 | en_US |
dc.identifier.epage | 249 | en_US |
dc.identifier.isi | WOS:000300041600012 | - |
dc.publisher.place | United States | en_US |
dc.identifier.scopusauthorid | Wang, Y=54961432200 | en_US |
dc.identifier.scopusauthorid | Leung, HCM=35233742700 | en_US |
dc.identifier.scopusauthorid | Yiu, SM=7003282240 | en_US |
dc.identifier.scopusauthorid | Chin, FYL=7005101915 | en_US |
dc.identifier.citeulike | 10311018 | - |
dc.identifier.issnl | 1066-5277 | - |