MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation

Yang, B; Peng, Y; Leung, HCM; Yiu, SM; Qin, J; Li, R; Chin, FYL

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/1854776.1854803
Scopus: eid_2-s2.0-77958056824

Supplementary

Bookmarks:
- CiteULike: 2
Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation

Title	MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation
Authors	Yang, B Peng, Y Leung, HCM Yiu, SM Qin, J Li, R Chin, FYL
Keywords	Algorithms Experimentation Measurement Performance Reliability
Issue Date	2010
Publisher	Association for Computing Machinery.
Citation	The 1st ACM International Conference on Bioinformatics and Computational Biology (ACM-BCB 2010), Niagara Falls, N.Y., 2-4 August 2010. How to Cite? DOI: http://dx.doi.org/10.1145/1854776.1854803
Abstract	Limited by the laboratory technique, traditional microorganism research usually focuses on one single individual species. This significantly limits the deep analysis of intricate biological processes among complex microorganism communities. With the rapid development of genome sequencing techniques, the traditional research methods of microorganisms based on the isolation and cultivation are gradually replaced by metagenomics, also known as environmental genomics. The first step, which is also the major bottleneck of metagenomic data analysis, is the identification and taxonomic characterization of the DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as “binning”. Existing binning methods based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms and phylogenetic markers. Due to the limited availability of reference genomes and the bias and unstableness of markers, these methods may not be applicable in all cases. Not much unsupervised binning methods are reported, but the unsupervised nature of these methods makes them extremely difficult to annotate the clusters with taxonomic labels. In this paper, we present MetaCluster 2.0, an unsupervised binning method which could bin metagenomic sequencing datasets with high accuracy, and also identify unknown genomes and annotate them with proper taxonomic labels. The running time of MetaCluster 2.0 is at least 30 times faster than existing binning algorithms.
Description	Proceedings of the 1st ACM International Conference on Bioinformatics and Computational Biology, 2010, p. 170-179
Persistent Identifier	http://hdl.handle.net/10722/129584
ISBN	978-1-4503-0438-2

DC Field	Value	Language
dc.contributor.author	Yang, B	en_US
dc.contributor.author	Peng, Y	en_US
dc.contributor.author	Leung, HCM	en_US
dc.contributor.author	Yiu, SM	en_US
dc.contributor.author	Qin, J	en_US
dc.contributor.author	Li, R	en_US
dc.contributor.author	Chin, FYL	en_US
dc.date.accessioned	2010-12-23T08:39:29Z	-
dc.date.available	2010-12-23T08:39:29Z	-
dc.date.issued	2010	en_US
dc.identifier.citation	The 1st ACM International Conference on Bioinformatics and Computational Biology (ACM-BCB 2010), Niagara Falls, N.Y., 2-4 August 2010.	en_US
dc.identifier.isbn	978-1-4503-0438-2	-
dc.identifier.uri	http://hdl.handle.net/10722/129584	-
dc.description	Proceedings of the 1st ACM International Conference on Bioinformatics and Computational Biology, 2010, p. 170-179	-
dc.description.abstract	Limited by the laboratory technique, traditional microorganism research usually focuses on one single individual species. This significantly limits the deep analysis of intricate biological processes among complex microorganism communities. With the rapid development of genome sequencing techniques, the traditional research methods of microorganisms based on the isolation and cultivation are gradually replaced by metagenomics, also known as environmental genomics. The first step, which is also the major bottleneck of metagenomic data analysis, is the identification and taxonomic characterization of the DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as “binning”. Existing binning methods based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms and phylogenetic markers. Due to the limited availability of reference genomes and the bias and unstableness of markers, these methods may not be applicable in all cases. Not much unsupervised binning methods are reported, but the unsupervised nature of these methods makes them extremely difficult to annotate the clusters with taxonomic labels. In this paper, we present MetaCluster 2.0, an unsupervised binning method which could bin metagenomic sequencing datasets with high accuracy, and also identify unknown genomes and annotate them with proper taxonomic labels. The running time of MetaCluster 2.0 is at least 30 times faster than existing binning algorithms.	-
dc.language	eng	en_US
dc.publisher	Association for Computing Machinery.	-
dc.relation.ispartof	International Conference on Bioinformatics and Computational Biology	-
dc.rights	Proceedings of the 1st ACM International Conference on Bioinformatics and Computational Biology. Copyright © Association for Computing Machinery.	-
dc.subject	Algorithms	-
dc.subject	Experimentation	-
dc.subject	Measurement	-
dc.subject	Performance	-
dc.subject	Reliability	-
dc.title	MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Yang, B: byang@cs.hku.hk	en_US
dc.identifier.email	Peng, Y: ypeng@cs.hku.hk	en_US
dc.identifier.email	Leung, HCM: cmleung2@cs.hku.hk	en_US
dc.identifier.email	Yiu, SM: smyiu@cs.hku.hk	-
dc.identifier.email	Qin, J: qinjj@genomics.org.cn	-
dc.identifier.email	Li, R: lirq@genomics.org.cn	-
dc.identifier.email	Chin, FYL: chin@cs.hku.hk	-
dc.identifier.authority	Leung, HCM=rp00144	en_US
dc.identifier.authority	Yiu, SM=rp00207	en_US
dc.identifier.authority	Chin, FYL=rp00105	en_US
dc.description.nature	postprint	-
dc.identifier.doi	10.1145/1854776.1854803	-
dc.identifier.scopus	eid_2-s2.0-77958056824	-
dc.identifier.hkuros	177374	en_US
dc.identifier.spage	170	-
dc.identifier.epage	179	-
dc.identifier.scopusauthorid	Yang, B=35075583700	-
dc.identifier.scopusauthorid	Peng, Y=30267885400	-
dc.identifier.scopusauthorid	Leung, HCM=35233742700	-
dc.identifier.scopusauthorid	Yiu, SM=7003282240	-
dc.identifier.scopusauthorid	Qin, J=14039564900	-
dc.identifier.scopusauthorid	Li, R=34975581600	-
dc.identifier.scopusauthorid	Chin, FYL=7005101915	-
dc.identifier.citeulike	8820341	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats