MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Wang, Y; Leung, HCM; Yiu, SM; Chin, FYL

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1093/bioinformatics/bts397
Scopus: eid_2-s2.0-84866458820
PMID: 22962452
WOS: WOS:000308532300008
Find via

Supplementary

Bookmarks:
- CiteULike: 4
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Title	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample
Authors	Wang, Y Leung, HCM Yiu, SM Chin, FYL
Issue Date	2012
Publisher	Oxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/
Citation	The 11th European Conference on Computational Biology (ECCB'12), Basel, Switzerland, 9-12 September 2012. In Bioinformatics, 2012, v. 28 n. 18, p. i356-i362 How to Cite? DOI: http://dx.doi.org/10.1093/bioinformatics/bts397
Abstract	MOTIVATION: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. RESULTS: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6x to 10x) and can achieve better sensitivity and specificity using less memory and running time. AVAILABILITY: http://i.cs.hku.hk/alse/MetaCluster/ CONTACT: chin@cs.hku.hk.
Description	All proceedings papers are available as open access at: OUP Bioinformatics (http://www.eccb12.org/proceedings-talks)
Persistent Identifier	http://hdl.handle.net/10722/165872
ISSN	1367-4803 2023 Impact Factor: 4.4 2023 SCImago Journal Rankings: 2.574
PubMed Central ID	PMC3436824
ISI Accession Number ID	WOS:000308532300008

DC Field	Value	Language
dc.contributor.author	Wang, Y	en_US
dc.contributor.author	Leung, HCM	en_US
dc.contributor.author	Yiu, SM	en_US
dc.contributor.author	Chin, FYL	en_US
dc.date.accessioned	2012-09-20T08:24:40Z	-
dc.date.available	2012-09-20T08:24:40Z	-
dc.date.issued	2012	en_US
dc.identifier.citation	The 11th European Conference on Computational Biology (ECCB'12), Basel, Switzerland, 9-12 September 2012. In Bioinformatics, 2012, v. 28 n. 18, p. i356-i362	en_US
dc.identifier.issn	1367-4803	-
dc.identifier.uri	http://hdl.handle.net/10722/165872	-
dc.description	All proceedings papers are available as open access at: OUP Bioinformatics (http://www.eccb12.org/proceedings-talks)	-
dc.description.abstract	MOTIVATION: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. RESULTS: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6x to 10x) and can achieve better sensitivity and specificity using less memory and running time. AVAILABILITY: http://i.cs.hku.hk/alse/MetaCluster/ CONTACT: chin@cs.hku.hk.	-
dc.language	eng	en_US
dc.publisher	Oxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/	en_US
dc.relation.ispartof	Bioinformatics	en_US
dc.title	MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Wang, Y: h1095106@hku.hk	en_US
dc.identifier.email	Leung, HCM: cmleung2@cs.hku.hk	en_US
dc.identifier.email	Yiu, SM: smyiu@cs.hku.hk	en_US
dc.identifier.email	Chin, FYL: chin@cs.hku.hk	-
dc.identifier.authority	Leung, HCM=rp00144	en_US
dc.identifier.authority	Yiu, SM=rp00207	en_US
dc.identifier.authority	Chin, FYL=rp00105	en_US
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1093/bioinformatics/bts397	-
dc.identifier.pmid	22962452	-
dc.identifier.pmcid	PMC3436824	-
dc.identifier.scopus	eid_2-s2.0-84866458820	-
dc.identifier.hkuros	202743	en_US
dc.identifier.hkuros	211202	-
dc.identifier.volume	28	en_US
dc.identifier.issue	18	en_US
dc.identifier.spage	i356	en_US
dc.identifier.epage	i362	en_US
dc.identifier.isi	WOS:000308532300008	-
dc.publisher.place	United Kingdom	-
dc.description.other	The 11th European Conference on Computational Biology (ECCB'12), Basel, Switzerland, 9-12 September 2012. In Bioinformatics, 2012, v. 28 n. 18, p. i356-i362	-
dc.identifier.citeulike	11201412	-
dc.identifier.issnl	1367-4803	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats