A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio

Leung, HCM; Yiu, SM; Yang, B; Peng, Y; Wang, Y; Liu, Z; Chen, J; Qin, J; Li, R; Chin, FYL

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1093/bioinformatics/btr186
Scopus: eid_2-s2.0-79957877228
PMID: 21493653
WOS: WOS:000291062400007
Find via

Supplementary

Bookmarks:
- CiteULike: 12
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

See more details

Article: A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio

Title

A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio

Authors

Leung, HCM Yiu, SM Yang, B Peng, Y Wang, Y Liu, Z Chen, J Qin, J Li, R Chin, FYL

Issue Date

2011

Publisher

Oxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/

Citation

Bioinformatics, 2011, v. 27 n. 11, p. 1489-1495 How to Cite?

DOI: http://dx.doi.org/10.1093/bioinformatics/btr186

Abstract

Motivation: With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'. Binning algorithms that are based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms or phylogenetic markers. Due to the limited availability of reference genomes and the bias and low availability of markers, these algorithms may not be applicable in all cases. Unsupervised binning algorithms which can handle fragments from unknown species provide an alternative approach. However, existing unsupervised binning algorithms only work on datasets either with balanced species abundance ratios or rather different abundance ratios, but not both. Results: In this article, we present MetaCluster 3.0, an integrated binning method based on the unsupervised top-down separation and bottom-up merging strategy, which can bin metagenomic fragments of species with very balanced abundance ratios (say 1:1) to very different abundance ratios (e.g. 1:24) with consistently higher accuracy than existing methods. © The Author 2011. Published by Oxford University Press. All rights reserved.

Persistent Identifier

http://hdl.handle.net/10722/140792

ISSN

1367-4803

2023 Impact Factor: 4.4

2023 SCImago Journal Rankings: 2.574

ISI Accession Number ID

WOS:000291062400007

Funding Agency	Grant Number
GRF	HKU 719709E HKU 711611

Funding Information:

GRF grant (HKU 719709E, HKU 711611 and HKU SPACE Research Fund) in part.

References

References in Scopus

Grants

Algorithms for Inferring k-articulated Phylogenetic Network

DC Field	Value	Language
dc.contributor.author	Leung, HCM	en_HK
dc.contributor.author	Yiu, SM	en_HK
dc.contributor.author	Yang, B	en_HK
dc.contributor.author	Peng, Y	en_HK
dc.contributor.author	Wang, Y	en_HK
dc.contributor.author	Liu, Z	en_HK
dc.contributor.author	Chen, J	en_HK
dc.contributor.author	Qin, J	en_HK
dc.contributor.author	Li, R	en_HK
dc.contributor.author	Chin, FYL	en_HK
dc.date.accessioned	2011-09-23T06:19:25Z	-
dc.date.available	2011-09-23T06:19:25Z	-
dc.date.issued	2011	en_HK
dc.identifier.citation	Bioinformatics, 2011, v. 27 n. 11, p. 1489-1495	en_HK
dc.identifier.issn	1367-4803	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/140792	-
dc.description.abstract	Motivation: With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'. Binning algorithms that are based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms or phylogenetic markers. Due to the limited availability of reference genomes and the bias and low availability of markers, these algorithms may not be applicable in all cases. Unsupervised binning algorithms which can handle fragments from unknown species provide an alternative approach. However, existing unsupervised binning algorithms only work on datasets either with balanced species abundance ratios or rather different abundance ratios, but not both. Results: In this article, we present MetaCluster 3.0, an integrated binning method based on the unsupervised top-down separation and bottom-up merging strategy, which can bin metagenomic fragments of species with very balanced abundance ratios (say 1:1) to very different abundance ratios (e.g. 1:24) with consistently higher accuracy than existing methods. © The Author 2011. Published by Oxford University Press. All rights reserved.	en_HK
dc.language	eng	en_US
dc.publisher	Oxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/	en_HK
dc.relation.ispartof	Bioinformatics	en_HK
dc.rights	This is a pre-copy-editing, author-produced PDF of an article accepted for publication in Bioinformatics following peer review. The definitive publisher-authenticated version Bioinformatics, 2011, v. 27 n. 11, p. 1489-1495 is available online at: http://bioinformatics.oxfordjournals.org/content/27/11/1489	-
dc.subject.mesh	Algorithms	-
dc.subject.mesh	Cluster Analysis	-
dc.subject.mesh	Metagenomics - methods	-
dc.subject.mesh	Sequence Analysis, DNA	-
dc.title	A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1367-4803&volume=27&issue=11&spage=1489&epage=1495&date=2011&atitle=A+robust+and+accurate+binning+algorithm+for+metagenomic+sequences+with+arbitrary+species+abundance+ratio	-
dc.identifier.email	Leung, HCM:cmleung2@cs.hku.hk	en_HK
dc.identifier.email	Yiu, SM:smyiu@cs.hku.hk	en_HK
dc.identifier.email	Chin, FYL:chin@cs.hku.hk	en_HK
dc.identifier.authority	Leung, HCM=rp00144	en_HK
dc.identifier.authority	Yiu, SM=rp00207	en_HK
dc.identifier.authority	Chin, FYL=rp00105	en_HK
dc.description.nature	postprint	-
dc.identifier.doi	10.1093/bioinformatics/btr186	en_HK
dc.identifier.pmid	21493653	-
dc.identifier.scopus	eid_2-s2.0-79957877228	en_HK
dc.identifier.hkuros	192228	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-79957877228&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	27	en_HK
dc.identifier.issue	11	en_HK
dc.identifier.spage	1489	en_HK
dc.identifier.epage	1495	en_HK
dc.identifier.eissn	1460-2059	-
dc.identifier.isi	WOS:000291062400007	-
dc.publisher.place	United Kingdom	en_HK
dc.relation.project	Algorithms for Inferring k-articulated Phylogenetic Network	-
dc.identifier.scopusauthorid	Leung, HCM=35233742700	en_HK
dc.identifier.scopusauthorid	Yiu, SM=7003282240	en_HK
dc.identifier.scopusauthorid	Yang, B=54394737300	en_HK
dc.identifier.scopusauthorid	Peng, Y=54393903900	en_HK
dc.identifier.scopusauthorid	Wang, Y=54394522700	en_HK
dc.identifier.scopusauthorid	Liu, Z=54393630900	en_HK
dc.identifier.scopusauthorid	Chen, J=54392639400	en_HK
dc.identifier.scopusauthorid	Qin, J=14039564900	en_HK
dc.identifier.scopusauthorid	Li, R=34975581600	en_HK
dc.identifier.scopusauthorid	Chin, FYL=7005101915	en_HK
dc.identifier.citeulike	9157005	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats