Assessing clusters and motifs from gene expression data

Jakt, LM; Cao, L; Cheah, KSE; Smith, DK

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1101/gr.148301
Scopus: eid_2-s2.0-0035156258
PMID: 11156620
WOS: WOS:000166361700011
Find via

Supplementary

Bookmarks:
- CiteULike: 2
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Biochemistry: Journal/Magazine Articles

Article: Assessing clusters and motifs from gene expression data

Title	Assessing clusters and motifs from gene expression data
Authors	Jakt, LM Cao, L Cheah, KSE Smith, DK
Issue Date	2001
Publisher	Cold Spring Harbor Laboratory Press, Publications Department. The Journal's web site is located at http://www.genome.org
Citation	Genome Research, 2001, v. 11 n. 1, p. 112-123 How to Cite? DOI: http://dx.doi.org/10.1101/gr.148301
Abstract	Large-scale gene expression studies and genomic sequencing projects are providing vast amounts of information that can be used to identify or predict cellular regulatory processes. Genes can be clustered on the basis of the similarity of their expression profiles or function and these clusters are likely to contain genes that are regulated by the same transcription factors. Searches for cis-regulatory elements can then be undertaken in the noncoding regions of the clustered genes. However, it is necessary to assess the efficiency of both the gene clustering and the postulated regulatory motifs, as there are many difficulties associated with clustering and determining the functional relevance of matches to sequence motifs. We have developed a method to assess the potential functional significance of clusters and motifs based on the probability of finding a certain number of matches to a motif in all of the gene clusters. To avoid problems with threshold scores for a match, the top matches to a motif are taken in several sample sizes. Genes from a sample are then counted by the cluster in which they appear. The probability of observing these counts by chance is calculated using the hypergeometric distribution. Because of the multiple sample sizes, strong and weak matching motifs can be detected and refined and significant matches to motifs across cluster boundaries are observed as all clusters are considered. By applying this method to many motifs and to a cluster set of yeast genes, we detected a similarity between Swi Five Factor and forkhead proteins and suggest that the currently unidentified Swi Five Factor is one of the yeast forkhead proteins.
Persistent Identifier	http://hdl.handle.net/10722/68302
ISSN	1088-9051 2023 Impact Factor: 6.2 2023 SCImago Journal Rankings: 4.403
PubMed Central ID	PMC311053
ISI Accession Number ID	WOS:000166361700011
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Jakt, LM	en_HK
dc.contributor.author	Cao, L	en_HK
dc.contributor.author	Cheah, KSE	en_HK
dc.contributor.author	Smith, DK	en_HK
dc.date.accessioned	2010-09-06T06:03:17Z	-
dc.date.available	2010-09-06T06:03:17Z	-
dc.date.issued	2001	en_HK
dc.identifier.citation	Genome Research, 2001, v. 11 n. 1, p. 112-123	en_HK
dc.identifier.issn	1088-9051	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/68302	-
dc.description.abstract	Large-scale gene expression studies and genomic sequencing projects are providing vast amounts of information that can be used to identify or predict cellular regulatory processes. Genes can be clustered on the basis of the similarity of their expression profiles or function and these clusters are likely to contain genes that are regulated by the same transcription factors. Searches for cis-regulatory elements can then be undertaken in the noncoding regions of the clustered genes. However, it is necessary to assess the efficiency of both the gene clustering and the postulated regulatory motifs, as there are many difficulties associated with clustering and determining the functional relevance of matches to sequence motifs. We have developed a method to assess the potential functional significance of clusters and motifs based on the probability of finding a certain number of matches to a motif in all of the gene clusters. To avoid problems with threshold scores for a match, the top matches to a motif are taken in several sample sizes. Genes from a sample are then counted by the cluster in which they appear. The probability of observing these counts by chance is calculated using the hypergeometric distribution. Because of the multiple sample sizes, strong and weak matching motifs can be detected and refined and significant matches to motifs across cluster boundaries are observed as all clusters are considered. By applying this method to many motifs and to a cluster set of yeast genes, we detected a similarity between Swi Five Factor and forkhead proteins and suggest that the currently unidentified Swi Five Factor is one of the yeast forkhead proteins.	en_HK
dc.language	eng	en_HK
dc.publisher	Cold Spring Harbor Laboratory Press, Publications Department. The Journal's web site is located at http://www.genome.org	en_HK
dc.relation.ispartof	Genome Research	en_HK
dc.subject.mesh	Amino Acid Motifs - genetics	en_HK
dc.subject.mesh	Animals	en_HK
dc.subject.mesh	Cell Cycle - genetics	en_HK
dc.subject.mesh	Computational Biology - methods	en_HK
dc.subject.mesh	Databases, Factual	en_HK
dc.subject.mesh	Drosophila melanogaster - genetics	en_HK
dc.subject.mesh	Forkhead Transcription Factors	en_HK
dc.subject.mesh	Gene Expression Profiling - methods	en_HK
dc.subject.mesh	Helix-Loop-Helix Motifs - genetics	en_HK
dc.subject.mesh	Humans	en_HK
dc.subject.mesh	Mice	en_HK
dc.subject.mesh	Multigene Family - genetics	en_HK
dc.subject.mesh	Nuclear Proteins - genetics	en_HK
dc.subject.mesh	Rats	en_HK
dc.subject.mesh	Saccharomyces cerevisiae - cytology - genetics	en_HK
dc.subject.mesh	Transcription Factors - genetics	en_HK
dc.subject.mesh	Xenopus laevis - genetics	en_HK
dc.title	Assessing clusters and motifs from gene expression data	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1088-9051&volume=11&spage=112&epage=123&date=2001&atitle=Assessing+clusters+and+motifs+from+gene+expression+data	en_HK
dc.identifier.email	Cheah, KSE:hrmbdkc@hku.hk	en_HK
dc.identifier.authority	Cheah, KSE=rp00342	en_HK
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1101/gr.148301	en_HK
dc.identifier.pmid	11156620	-
dc.identifier.pmcid	PMC311053	-
dc.identifier.scopus	eid_2-s2.0-0035156258	en_HK
dc.identifier.hkuros	58491	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-0035156258&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	11	en_HK
dc.identifier.issue	1	en_HK
dc.identifier.spage	112	en_HK
dc.identifier.epage	123	en_HK
dc.identifier.isi	WOS:000166361700011	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Jakt, LM=6507406360	en_HK
dc.identifier.scopusauthorid	Cao, L=7401637818	en_HK
dc.identifier.scopusauthorid	Cheah, KSE=35387746200	en_HK
dc.identifier.scopusauthorid	Smith, DK=7410351143	en_HK
dc.identifier.citeulike	2931199	-
dc.identifier.issnl	1088-9051	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Assessing clusters and motifs from gene expression data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats