File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Aligning multiple sequences adaptively

TitleAligning multiple sequences adaptively
Authors
Advisors
Advisor(s):Ting, HF
Issue Date2014
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ye, Y. [叶永滔]. (2014). Aligning multiple sequences adaptively. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5317071
AbstractWith the rapid development of genome sequencing, an ever-increasing number of molecular biology analyses rely on the construction of an accurate multiple sequence alignment (MSA), such as motifs detection, phylogeny inference and structure prediction. Although many methods have been developed during the last two decades, most of them may perform poorly on some types of inputs, in particular when families of sequences fall below thirty percent similarity. Therefore, this thesis introduced two different effective approaches to improve the overall quality of multiple sequence alignment. First, by considering the similarity of the input sequences, we proposed an adaptive approach to compute better substitution matrices for each pair of sequences, and then apply the progressive alignment method to align them. For example, for inputs with high similarity, we consider the whole sequences and align them with global pair-Hidden Markov model, while for those with moderate low similarity, we may ignore the ank regions and use some local pair-Hidden Markov models to align them. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with one dozen leading tools on three benchmark alignment databases, and GLProbs' alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic tree reconstruction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging. Second, based on our previous study, we proposed another new tool PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies input sequences into two types: normally related sequences and distantly related sequences. For normally related sequences, it uses an adaptive approach to construct the guide tree, and based on this guide tree, aligns the sequences progressively. To be more precise, it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the best method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree; instead it uses the non-progressive sequence annealing method to construct the multiple sequence alignment. By combining the strength of the progressive and non-progressive methods, and with a better way to construct the guide tree, PnpProbs improves the quality of multiple sequence alignments significantly for not only general input sequences, but also those very distantly related. With those encouraging empirical results, our developed software tools have been appreciated by the community gradually. For example, GLProbs has been invited and incorporated into the JAva Bioinformatics Analysis Web Services system (JABAWS).
DegreeMaster of Philosophy
SubjectSequence alignment (Bioinformatics)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/206465

 

DC FieldValueLanguage
dc.contributor.advisorTing, HF-
dc.contributor.authorYe, Yongtao-
dc.contributor.author叶永滔-
dc.date.accessioned2014-10-31T23:15:57Z-
dc.date.available2014-10-31T23:15:57Z-
dc.date.issued2014-
dc.identifier.citationYe, Y. [叶永滔]. (2014). Aligning multiple sequences adaptively. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5317071-
dc.identifier.urihttp://hdl.handle.net/10722/206465-
dc.description.abstractWith the rapid development of genome sequencing, an ever-increasing number of molecular biology analyses rely on the construction of an accurate multiple sequence alignment (MSA), such as motifs detection, phylogeny inference and structure prediction. Although many methods have been developed during the last two decades, most of them may perform poorly on some types of inputs, in particular when families of sequences fall below thirty percent similarity. Therefore, this thesis introduced two different effective approaches to improve the overall quality of multiple sequence alignment. First, by considering the similarity of the input sequences, we proposed an adaptive approach to compute better substitution matrices for each pair of sequences, and then apply the progressive alignment method to align them. For example, for inputs with high similarity, we consider the whole sequences and align them with global pair-Hidden Markov model, while for those with moderate low similarity, we may ignore the ank regions and use some local pair-Hidden Markov models to align them. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with one dozen leading tools on three benchmark alignment databases, and GLProbs' alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic tree reconstruction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging. Second, based on our previous study, we proposed another new tool PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies input sequences into two types: normally related sequences and distantly related sequences. For normally related sequences, it uses an adaptive approach to construct the guide tree, and based on this guide tree, aligns the sequences progressively. To be more precise, it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the best method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree; instead it uses the non-progressive sequence annealing method to construct the multiple sequence alignment. By combining the strength of the progressive and non-progressive methods, and with a better way to construct the guide tree, PnpProbs improves the quality of multiple sequence alignments significantly for not only general input sequences, but also those very distantly related. With those encouraging empirical results, our developed software tools have been appreciated by the community gradually. For example, GLProbs has been invited and incorporated into the JAva Bioinformatics Analysis Web Services system (JABAWS).-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshSequence alignment (Bioinformatics)-
dc.titleAligning multiple sequences adaptively-
dc.typePG_Thesis-
dc.identifier.hkulb5317071-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5317071-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats