File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning

TitleMetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning
Authors
Issue Date2014
PublisherBioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcgenomics/
Citation
BMC Genomics, 2014, v. 15 n. Suppl 1, p. article no. S12 How to Cite?
AbstractBackground Taxonomic annotation of reads is an important problem in metagenomic analysis. Existing annotation tools, which rely on the approach of aligning each read to the taxonomic structure, are unable to annotate many reads efficiently and accurately as reads (100 bp) are short and most of them come from unknown genomes. Previous work has suggested assembling the reads to make longer contigs before annotation. More reads/contigs can be annotated as a longer contig (in Kbp) can be aligned to a taxon even if it is from an unknown species as long as it contains a conserved region of that taxon. Unfortunately existing metagenomic assembly tools are not mature enough to produce long enough contigs. Binning tries to group reads/contigs of similar species together. Intuitively, reads in the same group (cluster) should be annotated to the same taxon and these reads altogether should cover a significant portion of the genome alleviating the problem of short contigs if the quality of binning is high. However, no existing work has tried to use binning results to help solve the annotation problem. This work explores this direction. Results In this paper, we describe MetaCluster-TA, an assembly-assisted binning-based annotation tool which relies on an innovative idea of annotating binned reads instead of aligning each read or contig to the taxonomic structure separately. We propose the novel concept of the 'virtual contig' (which can be up to 10 Kb in length) to represent a set of reads and then represent each cluster as a set of 'virtual contigs' (which together can be total up to 1 Mb in length) for annotation. MetaCluster-TA can outperform widely-used MEGAN4 and can annotate (1) more reads since the virtual contigs are much longer; (2) more accurately since each cluster of long virtual contigs contains global information of the sampled genome which tends to be more accurate than short reads or assembled contigs which contain only local information of the genome; and (3) more efficiently since there are much fewer long virtual contigs to align than short reads. MetaCluster-TA outperforms MetaCluster 5.0 as a binning tool since binning itself can be more sensitive and precise given long virtual contigs and the binning results can be improved using the reference taxonomic database. Conclusions MetaCluster-TA can outperform widely-used MEGAN4 and can annotate more reads with higher accuracy and higher efficiency. It also outperforms MetaCluster 5.0 as a binning tool.
DescriptionThis article is part of the supplement: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Genomics
Persistent Identifierhttp://hdl.handle.net/10722/195943
ISSN
2015 Impact Factor: 3.867
2015 SCImago Journal Rankings: 2.343
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorWang, Yen_US
dc.contributor.authorLeung, HCMen_US
dc.contributor.authorYiu, SMen_US
dc.contributor.authorChin, FYLen_US
dc.date.accessioned2014-03-21T02:26:14Z-
dc.date.available2014-03-21T02:26:14Z-
dc.date.issued2014en_US
dc.identifier.citationBMC Genomics, 2014, v. 15 n. Suppl 1, p. article no. S12en_US
dc.identifier.issn1471-2164en_US
dc.identifier.urihttp://hdl.handle.net/10722/195943-
dc.descriptionThis article is part of the supplement: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Genomics-
dc.description.abstractBackground Taxonomic annotation of reads is an important problem in metagenomic analysis. Existing annotation tools, which rely on the approach of aligning each read to the taxonomic structure, are unable to annotate many reads efficiently and accurately as reads (100 bp) are short and most of them come from unknown genomes. Previous work has suggested assembling the reads to make longer contigs before annotation. More reads/contigs can be annotated as a longer contig (in Kbp) can be aligned to a taxon even if it is from an unknown species as long as it contains a conserved region of that taxon. Unfortunately existing metagenomic assembly tools are not mature enough to produce long enough contigs. Binning tries to group reads/contigs of similar species together. Intuitively, reads in the same group (cluster) should be annotated to the same taxon and these reads altogether should cover a significant portion of the genome alleviating the problem of short contigs if the quality of binning is high. However, no existing work has tried to use binning results to help solve the annotation problem. This work explores this direction. Results In this paper, we describe MetaCluster-TA, an assembly-assisted binning-based annotation tool which relies on an innovative idea of annotating binned reads instead of aligning each read or contig to the taxonomic structure separately. We propose the novel concept of the 'virtual contig' (which can be up to 10 Kb in length) to represent a set of reads and then represent each cluster as a set of 'virtual contigs' (which together can be total up to 1 Mb in length) for annotation. MetaCluster-TA can outperform widely-used MEGAN4 and can annotate (1) more reads since the virtual contigs are much longer; (2) more accurately since each cluster of long virtual contigs contains global information of the sampled genome which tends to be more accurate than short reads or assembled contigs which contain only local information of the genome; and (3) more efficiently since there are much fewer long virtual contigs to align than short reads. MetaCluster-TA outperforms MetaCluster 5.0 as a binning tool since binning itself can be more sensitive and precise given long virtual contigs and the binning results can be improved using the reference taxonomic database. Conclusions MetaCluster-TA can outperform widely-used MEGAN4 and can annotate more reads with higher accuracy and higher efficiency. It also outperforms MetaCluster 5.0 as a binning tool.en_US
dc.languageengen_US
dc.publisherBioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcgenomics/en_US
dc.relation.ispartofBMC Genomicsen_US
dc.rightsCreative Commons: Attribution 3.0 Hong Kong Licenseen_US
dc.titleMetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binningen_US
dc.typeArticleen_US
dc.identifier.emailLeung, HCM: cmleung2@cs.hku.hken_US
dc.identifier.emailYiu, SM: smyiu@cs.hku.hken_US
dc.identifier.emailChin, FYL: chin@cs.hku.hken_US
dc.identifier.authorityLeung, HCM=rp00144en_US
dc.identifier.authorityYiu, SM=rp00207en_US
dc.identifier.authorityChin, FYL=rp00105en_US
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1186/1471-2164-15-S1-S12en_US
dc.identifier.pmid24564377-
dc.identifier.hkuros228334en_US
dc.identifier.volume15en_US
dc.identifier.issueSuppl 1en_US
dc.identifier.spagearticle no. S12en_US
dc.identifier.epagearticle no. S12en_US
dc.identifier.isiWOS:000330693900012-
dc.publisher.placeUnited Kingdomen_US

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats