File Download
Supplementary

Citations:
 Appears in Collections:
postgraduate thesis: Phylogenetic tree reconstruction with protein linkage
Title  Phylogenetic tree reconstruction with protein linkage 

Authors  
Advisors  Advisor(s):Chin, FYL 
Issue Date  2012 
Publisher  The University of Hong Kong (Pokfulam, Hong Kong) 
Citation  Yu, J. [于俊杰]. (2012). Phylogenetic tree reconstruction with protein linkage. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961816 
Abstract  Phylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the FitchHartigan algorithm and the Sankoff algorithm are two polynomialtime algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent.
Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NPhard.
In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straightforward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance. 
Degree  Master of Philosophy 
Subject  Phylogeny. Combinatorial analysis. 
Dept/Program  Computer Science 
Persistent Identifier  http://hdl.handle.net/10722/181488 
DC Field  Value  Language 

dc.contributor.advisor  Chin, FYL   
dc.contributor.author  Yu, Junjie.   
dc.contributor.author  于俊杰.   
dc.date.accessioned  20130303T03:20:04Z   
dc.date.available  20130303T03:20:04Z   
dc.date.issued  2012   
dc.identifier.citation  Yu, J. [于俊杰]. (2012). Phylogenetic tree reconstruction with protein linkage. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961816   
dc.identifier.uri  http://hdl.handle.net/10722/181488   
dc.description.abstract  Phylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the FitchHartigan algorithm and the Sankoff algorithm are two polynomialtime algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent. Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NPhard. In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straightforward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance.   
dc.language  eng   
dc.publisher  The University of Hong Kong (Pokfulam, Hong Kong)   
dc.relation.ispartof  HKU Theses Online (HKUTO)   
dc.rights  The author retains all proprietary rights, (such as patent rights) and the right to use in future works.   
dc.rights  Creative Commons: Attribution 3.0 Hong Kong License   
dc.source.uri  http://hub.hku.hk/bib/B49618167   
dc.subject.lcsh  Phylogeny.   
dc.subject.lcsh  Combinatorial analysis.   
dc.title  Phylogenetic tree reconstruction with protein linkage   
dc.type  PG_Thesis   
dc.identifier.hkul  b4961816   
dc.description.thesisname  Master of Philosophy   
dc.description.thesislevel  Master   
dc.description.thesisdiscipline  Computer Science   
dc.description.nature  published_or_final_version   
dc.identifier.doi  10.5353/th_b4961816   
dc.date.hkucongregation  2013   