File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Finding motifs for insufficient number of sequences with strong binding to transcription factor

TitleFinding motifs for insufficient number of sequences with strong binding to transcription factor
Authors
KeywordsBinding Energy
DNA Microarray
Motif Finding
Transcription Factor
Issue Date2004
PublisherACM.
Citation
The 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004), San Diego, CA., 27-31 March 2004. In Conference Proceedings. 2004, v. 8, p. 125-132 How to Cite?
AbstractFinding motifs is an important problem in computational biology. Our paper makes two major contributions to this problem. Firstly, we better characterize the types of problem instances that cannot be solved by most existing methods of finding motifs. Secondly, we introduce a different method, which is shown to succeed for various problem instances for which popular existing methods fail. Most existing computational methods to finding motifs are based on the strong-signal model wherein only strong-signal sequences (i.e. those that are known to contain binding sites very similar to the motif) are considered as input and weak-signal sequences (i.e. those do not contain any sub-string similar to the motif) are disregarded. Buhler and Tompa have studied the limitations of methods based on the strong-signal model. They characterized the problem instances for which the motif is unlikely to be found in terms of the number of input (strong-signal) sequences needed under the assumption that each input sequence contains exactly one binding site. They further gave a method to calculate the minimum number of input sequences required. We re-characterize the limitations of the strong-signal model in terms of the minimum total number of binding sites, rather than the minimum number of strong-signal sequences, required to be in the input data set. We use a probability matrix to represent a motif instead of a string pattern to calculate the minimum total number of binding sites required. This new characterization is shown to be more general and realistic. Next, we introduce a more general and realistic energy-based model, which considers all available sequences (including weak-signal sequences) with varying degrees of binding strength to the transcription factors (as measured experimentally by observed color intensity). Given varying degrees of binding strength, our model can consider sequences ranging from those that contain more than one binding site to those that are weak sequences. By treating sequences with different degrees of binding strength differently, we develop a heuristic algorithm called EBMF (Energy-Based Motif Finding algorithm) using an EM-like approach to find motifs under our model. This EBMF algorithm can find motifs for data sets that do not even have the required minimum number of binding sites as previously derived for the strong-signal model. Our algorithm compares favorably with common motif-finding programs AlignACE and MEME, which are based on the strong-signal model. In particular, for some simulated and real data sets, our algorithm finds the motif when both AlignACE and MEME fail to do so.
Persistent Identifierhttp://hdl.handle.net/10722/93128
References

 

DC FieldValueLanguage
dc.contributor.authorChin, FYLen_HK
dc.contributor.authorLeung, HCMen_HK
dc.contributor.authorYiu, SMen_HK
dc.contributor.authorLam, TWen_HK
dc.contributor.authorRosenfeld, Ren_HK
dc.contributor.authorTsang, WWen_HK
dc.contributor.authorSmith, DKen_HK
dc.contributor.authorJiang, Yen_HK
dc.date.accessioned2010-09-25T14:51:44Z-
dc.date.available2010-09-25T14:51:44Z-
dc.date.issued2004en_HK
dc.identifier.citationThe 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004), San Diego, CA., 27-31 March 2004. In Conference Proceedings. 2004, v. 8, p. 125-132en_HK
dc.identifier.urihttp://hdl.handle.net/10722/93128-
dc.description.abstractFinding motifs is an important problem in computational biology. Our paper makes two major contributions to this problem. Firstly, we better characterize the types of problem instances that cannot be solved by most existing methods of finding motifs. Secondly, we introduce a different method, which is shown to succeed for various problem instances for which popular existing methods fail. Most existing computational methods to finding motifs are based on the strong-signal model wherein only strong-signal sequences (i.e. those that are known to contain binding sites very similar to the motif) are considered as input and weak-signal sequences (i.e. those do not contain any sub-string similar to the motif) are disregarded. Buhler and Tompa have studied the limitations of methods based on the strong-signal model. They characterized the problem instances for which the motif is unlikely to be found in terms of the number of input (strong-signal) sequences needed under the assumption that each input sequence contains exactly one binding site. They further gave a method to calculate the minimum number of input sequences required. We re-characterize the limitations of the strong-signal model in terms of the minimum total number of binding sites, rather than the minimum number of strong-signal sequences, required to be in the input data set. We use a probability matrix to represent a motif instead of a string pattern to calculate the minimum total number of binding sites required. This new characterization is shown to be more general and realistic. Next, we introduce a more general and realistic energy-based model, which considers all available sequences (including weak-signal sequences) with varying degrees of binding strength to the transcription factors (as measured experimentally by observed color intensity). Given varying degrees of binding strength, our model can consider sequences ranging from those that contain more than one binding site to those that are weak sequences. By treating sequences with different degrees of binding strength differently, we develop a heuristic algorithm called EBMF (Energy-Based Motif Finding algorithm) using an EM-like approach to find motifs under our model. This EBMF algorithm can find motifs for data sets that do not even have the required minimum number of binding sites as previously derived for the strong-signal model. Our algorithm compares favorably with common motif-finding programs AlignACE and MEME, which are based on the strong-signal model. In particular, for some simulated and real data sets, our algorithm finds the motif when both AlignACE and MEME fail to do so.en_HK
dc.languageengen_HK
dc.publisherACM.-
dc.relation.ispartofRECOMB 2004 - Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biologyen_HK
dc.subjectBinding Energyen_HK
dc.subjectDNA Microarrayen_HK
dc.subjectMotif Findingen_HK
dc.subjectTranscription Factoren_HK
dc.titleFinding motifs for insufficient number of sequences with strong binding to transcription factoren_HK
dc.typeConference_Paperen_HK
dc.identifier.emailChin, FYL:chin@cs.hku.hken_HK
dc.identifier.emailLeung, HCM:cmleung2@cs.hku.hken_HK
dc.identifier.emailYiu, SM:smyiu@cs.hku.hken_HK
dc.identifier.emailLam, TW:twlam@cs.hku.hken_HK
dc.identifier.emailTsang, WW:tsang@cs.hku.hken_HK
dc.identifier.emailSmith, DK: dsmith@hkucc.hku.hk-
dc.identifier.authorityChin, FYL=rp00105en_HK
dc.identifier.authorityLeung, HCM=rp00144en_HK
dc.identifier.authorityYiu, SM=rp00207en_HK
dc.identifier.authorityLam, TW=rp00135en_HK
dc.identifier.authorityTsang, WW=rp00179en_HK
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-2442447220en_HK
dc.identifier.hkuros86051en_HK
dc.identifier.hkuros129061-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-2442447220&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume8en_HK
dc.identifier.spage125en_HK
dc.identifier.epage132en_HK
dc.identifier.scopusauthoridChin, FYL=7005101915en_HK
dc.identifier.scopusauthoridLeung, HCM=35233742700en_HK
dc.identifier.scopusauthoridYiu, SM=7003282240en_HK
dc.identifier.scopusauthoridLam, TW=7202523165en_HK
dc.identifier.scopusauthoridRosenfeld, R=7201664625en_HK
dc.identifier.scopusauthoridTsang, WW=7201558521en_HK
dc.identifier.scopusauthoridSmith, DK=7410351143en_HK
dc.identifier.scopusauthoridJiang, Y=7404832549en_HK
dc.customcontrol.immutablesml 151014 - merged-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats