File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: A more accurate and efficient whole genome phylogeny

TitleA more accurate and efficient whole genome phylogeny
Authors
Issue Date2006
PublisherWorld Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml
Citation
Series On Advances In Bioinformatics And Computational Biology, 2006, v. 3, p. 337-352 How to Cite?
AbstractTo reconstruct a phylogeny for a given set of species, most of the previous approaches are based on the similarity information derived from a subset of conserved regions (or genes) in the corresponding genomes. In some cases, the regions chosen may not reflect the evolutionary history of the species and may be too restricted to differentiate the species. It is generally believed that the inference could be more accurate if whole genomes are being considered. The best existing solution that makes use of complete genomes was proposed by Henz et al.13 They can construct a phylogeny for 91 prokaryotic genomes in 170 CPU hours with an accuracy of about 70% (based on the measurement of non-trivial splits) while other approaches that use whole genomes can only deal with no more than 20 species. Note that Henz et al. measure the distance between the species using BLASTN which is not primarily designed for whole genome alignment. Also, their approach is not scalable, for example, it probably takes over 1000 CPU hours to construct a phylogeny for all 230 prokaryotic genomes published by NCBI. In addition, we found that non-trivial splits is only a rough indicator of the accuracy of the phylogeny. In this paper, we propose the followings. (1) To evaluate the quality of a phylogeny with respect to a model answer, we suggest to use the concept of the maximum agreement subtree as it can capture the structure of the phylogeny. (2)We propose to use whole genome alignment software (such as MUMmer) to measure the distances between the species and derive an efficient approach to generate these distances. From the experiments on real data sets, we found that our approach is more accurate and more scalable than Henz et al.'s approach. We can construct a phylogenetic tree for the same set of 91 genomes with an accuracy more than 20% higher (with respect to both evaluation measures) in 2 CPU hours (more than 80 times faster than their approach). Also, our approach is scalable and can construct a phylogeny for 230 prokaryotic genomes with accuracy as high as 85% in only 9.5 CPU hours.
Persistent Identifierhttp://hdl.handle.net/10722/93474
ISBN
ISSN
References

 

DC FieldValueLanguage
dc.contributor.authorChan, PYen_HK
dc.contributor.authorLam, TWen_HK
dc.contributor.authorYiu, SMen_HK
dc.contributor.authorLiu, CMen_HK
dc.date.accessioned2010-09-25T15:02:15Z-
dc.date.available2010-09-25T15:02:15Z-
dc.date.issued2006en_HK
dc.identifier.citationSeries On Advances In Bioinformatics And Computational Biology, 2006, v. 3, p. 337-352en_HK
dc.identifier.isbn978-186094623-3-
dc.identifier.issn1751-6404en_HK
dc.identifier.urihttp://hdl.handle.net/10722/93474-
dc.description.abstractTo reconstruct a phylogeny for a given set of species, most of the previous approaches are based on the similarity information derived from a subset of conserved regions (or genes) in the corresponding genomes. In some cases, the regions chosen may not reflect the evolutionary history of the species and may be too restricted to differentiate the species. It is generally believed that the inference could be more accurate if whole genomes are being considered. The best existing solution that makes use of complete genomes was proposed by Henz et al.13 They can construct a phylogeny for 91 prokaryotic genomes in 170 CPU hours with an accuracy of about 70% (based on the measurement of non-trivial splits) while other approaches that use whole genomes can only deal with no more than 20 species. Note that Henz et al. measure the distance between the species using BLASTN which is not primarily designed for whole genome alignment. Also, their approach is not scalable, for example, it probably takes over 1000 CPU hours to construct a phylogeny for all 230 prokaryotic genomes published by NCBI. In addition, we found that non-trivial splits is only a rough indicator of the accuracy of the phylogeny. In this paper, we propose the followings. (1) To evaluate the quality of a phylogeny with respect to a model answer, we suggest to use the concept of the maximum agreement subtree as it can capture the structure of the phylogeny. (2)We propose to use whole genome alignment software (such as MUMmer) to measure the distances between the species and derive an efficient approach to generate these distances. From the experiments on real data sets, we found that our approach is more accurate and more scalable than Henz et al.'s approach. We can construct a phylogenetic tree for the same set of 91 genomes with an accuracy more than 20% higher (with respect to both evaluation measures) in 2 CPU hours (more than 80 times faster than their approach). Also, our approach is scalable and can construct a phylogeny for 230 prokaryotic genomes with accuracy as high as 85% in only 9.5 CPU hours.en_HK
dc.languageengen_HK
dc.publisherWorld Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtmlen_HK
dc.relation.ispartofSeries on Advances in Bioinformatics and Computational Biologyen_HK
dc.titleA more accurate and efficient whole genome phylogenyen_HK
dc.typeConference_Paperen_HK
dc.identifier.emailLam, TW:twlam@cs.hku.hken_HK
dc.identifier.emailYiu, SM:smyiu@cs.hku.hken_HK
dc.identifier.authorityLam, TW=rp00135en_HK
dc.identifier.authorityYiu, SM=rp00207en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-84863050946en_HK
dc.identifier.hkuros118587en_HK
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-84856987338&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume3en_HK
dc.identifier.spage337en_HK
dc.identifier.epage352en_HK
dc.publisher.placeSingaporeen_HK
dc.identifier.scopusauthoridChan, PY=26435793700en_HK
dc.identifier.scopusauthoridLam, TW=7202523165en_HK
dc.identifier.scopusauthoridYiu, SM=7003282240en_HK
dc.identifier.scopusauthoridLiu, CM=54984108400en_HK
dc.identifier.issnl1751-6404-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats