A more accurate and efficient whole genome phylogeny

Chan, PY; Lam, TW; Yiu, SM; Liu, CM

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-84863050946
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: A more accurate and efficient whole genome phylogeny

Title	A more accurate and efficient whole genome phylogeny
Authors	Chan, PY Lam, TW Yiu, SM Liu, CM
Issue Date	2006
Publisher	World Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml
Citation	Series On Advances In Bioinformatics And Computational Biology, 2006, v. 3, p. 337-352 How to Cite?
Abstract	To reconstruct a phylogeny for a given set of species, most of the previous approaches are based on the similarity information derived from a subset of conserved regions (or genes) in the corresponding genomes. In some cases, the regions chosen may not reflect the evolutionary history of the species and may be too restricted to differentiate the species. It is generally believed that the inference could be more accurate if whole genomes are being considered. The best existing solution that makes use of complete genomes was proposed by Henz et al.13 They can construct a phylogeny for 91 prokaryotic genomes in 170 CPU hours with an accuracy of about 70% (based on the measurement of non-trivial splits) while other approaches that use whole genomes can only deal with no more than 20 species. Note that Henz et al. measure the distance between the species using BLASTN which is not primarily designed for whole genome alignment. Also, their approach is not scalable, for example, it probably takes over 1000 CPU hours to construct a phylogeny for all 230 prokaryotic genomes published by NCBI. In addition, we found that non-trivial splits is only a rough indicator of the accuracy of the phylogeny. In this paper, we propose the followings. (1) To evaluate the quality of a phylogeny with respect to a model answer, we suggest to use the concept of the maximum agreement subtree as it can capture the structure of the phylogeny. (2)We propose to use whole genome alignment software (such as MUMmer) to measure the distances between the species and derive an efficient approach to generate these distances. From the experiments on real data sets, we found that our approach is more accurate and more scalable than Henz et al.'s approach. We can construct a phylogenetic tree for the same set of 91 genomes with an accuracy more than 20% higher (with respect to both evaluation measures) in 2 CPU hours (more than 80 times faster than their approach). Also, our approach is scalable and can construct a phylogeny for 230 prokaryotic genomes with accuracy as high as 85% in only 9.5 CPU hours.
Persistent Identifier	http://hdl.handle.net/10722/93474
ISBN	978-186094623-3
ISSN	1751-6404
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Chan, PY	en_HK
dc.contributor.author	Lam, TW	en_HK
dc.contributor.author	Yiu, SM	en_HK
dc.contributor.author	Liu, CM	en_HK
dc.date.accessioned	2010-09-25T15:02:15Z	-
dc.date.available	2010-09-25T15:02:15Z	-
dc.date.issued	2006	en_HK
dc.identifier.citation	Series On Advances In Bioinformatics And Computational Biology, 2006, v. 3, p. 337-352	en_HK
dc.identifier.isbn	978-186094623-3	-
dc.identifier.issn	1751-6404	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/93474	-
dc.description.abstract	To reconstruct a phylogeny for a given set of species, most of the previous approaches are based on the similarity information derived from a subset of conserved regions (or genes) in the corresponding genomes. In some cases, the regions chosen may not reflect the evolutionary history of the species and may be too restricted to differentiate the species. It is generally believed that the inference could be more accurate if whole genomes are being considered. The best existing solution that makes use of complete genomes was proposed by Henz et al.13 They can construct a phylogeny for 91 prokaryotic genomes in 170 CPU hours with an accuracy of about 70% (based on the measurement of non-trivial splits) while other approaches that use whole genomes can only deal with no more than 20 species. Note that Henz et al. measure the distance between the species using BLASTN which is not primarily designed for whole genome alignment. Also, their approach is not scalable, for example, it probably takes over 1000 CPU hours to construct a phylogeny for all 230 prokaryotic genomes published by NCBI. In addition, we found that non-trivial splits is only a rough indicator of the accuracy of the phylogeny. In this paper, we propose the followings. (1) To evaluate the quality of a phylogeny with respect to a model answer, we suggest to use the concept of the maximum agreement subtree as it can capture the structure of the phylogeny. (2)We propose to use whole genome alignment software (such as MUMmer) to measure the distances between the species and derive an efficient approach to generate these distances. From the experiments on real data sets, we found that our approach is more accurate and more scalable than Henz et al.'s approach. We can construct a phylogenetic tree for the same set of 91 genomes with an accuracy more than 20% higher (with respect to both evaluation measures) in 2 CPU hours (more than 80 times faster than their approach). Also, our approach is scalable and can construct a phylogeny for 230 prokaryotic genomes with accuracy as high as 85% in only 9.5 CPU hours.	en_HK
dc.language	eng	en_HK
dc.publisher	World Scientific Publishing Co Pte Ltd. The Journal's web site is located at http://www.worldscibooks.com/series/abcb_series.shtml	en_HK
dc.relation.ispartof	Series on Advances in Bioinformatics and Computational Biology	en_HK
dc.title	A more accurate and efficient whole genome phylogeny	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Lam, TW:twlam@cs.hku.hk	en_HK
dc.identifier.email	Yiu, SM:smyiu@cs.hku.hk	en_HK
dc.identifier.authority	Lam, TW=rp00135	en_HK
dc.identifier.authority	Yiu, SM=rp00207	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-84863050946	en_HK
dc.identifier.hkuros	118587	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-84856987338&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	3	en_HK
dc.identifier.spage	337	en_HK
dc.identifier.epage	352	en_HK
dc.publisher.place	Singapore	en_HK
dc.identifier.scopusauthorid	Chan, PY=26435793700	en_HK
dc.identifier.scopusauthorid	Lam, TW=7202523165	en_HK
dc.identifier.scopusauthorid	Yiu, SM=7003282240	en_HK
dc.identifier.scopusauthorid	Liu, CM=54984108400	en_HK
dc.identifier.issnl	1751-6404	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: A more accurate and efficient whole genome phylogeny

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats