Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

Yao, J; Chang, C; Salmi, ML; Hung, YS; Loraine, A; Roux, SJ

File Download

1471-2105-9-288.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/1471-2105-9-288
Scopus: eid_2-s2.0-47349103055
PMID: 18564431
WOS: WOS:000257669300001
Find via

Supplementary

Bookmarks:
- CiteULike: 2
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles

Article: Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

Title	Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
Authors	Yao, J Chang, C Salmi, ML Hung, YS Loraine, A Roux, SJ
Issue Date	2008
Publisher	BioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/
Citation	Bmc Bioinformatics, 2008, v. 9 How to Cite? DOI: http://dx.doi.org/10.1186/1471-2105-9-288
Abstract	Background: Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. Results: In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. Conclusion: This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology. © 2008 Yao et al; licensee BioMed Central Ltd.
Persistent Identifier	http://hdl.handle.net/10722/57431
ISSN	1471-2105 2023 Impact Factor: 2.9 2023 SCImago Journal Rankings: 1.005
PubMed Central ID	PMC2459189
ISI Accession Number ID	WOS:000257669300001
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Yao, J	en_HK
dc.contributor.author	Chang, C	en_HK
dc.contributor.author	Salmi, ML	en_HK
dc.contributor.author	Hung, YS	en_HK
dc.contributor.author	Loraine, A	en_HK
dc.contributor.author	Roux, SJ	en_HK
dc.date.accessioned	2010-04-12T01:36:43Z	-
dc.date.available	2010-04-12T01:36:43Z	-
dc.date.issued	2008	en_HK
dc.identifier.citation	Bmc Bioinformatics, 2008, v. 9	en_HK
dc.identifier.issn	1471-2105	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/57431	-
dc.description.abstract	Background: Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. Results: In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. Conclusion: This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology. © 2008 Yao et al; licensee BioMed Central Ltd.	en_HK
dc.language	eng	en_HK
dc.publisher	BioMed Central Ltd. The Journal's web site is located at http://www.biomedcentral.com/bmcbioinformatics/	en_HK
dc.relation.ispartof	BMC Bioinformatics	en_HK
dc.rights	B M C Bioinformatics. Copyright © BioMed Central Ltd.	en_HK
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.mesh	Computational Biology - methods	en_HK
dc.subject.mesh	Genomics - methods	en_HK
dc.subject.mesh	Oligonucleotide Array Sequence Analysis - methods - statistics & numerical data	en_HK
dc.subject.mesh	Probability	en_HK
dc.subject.mesh	Artificial Intelligence	en_HK
dc.title	Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1471-2105&volume=9 article no. 288&spage=&epage=&date=2008&atitle=Genome-scale+cluster+analysis+of+replicated+microarrays+using+shrinkage+correlation+coefficient	en_HK
dc.identifier.email	Chang, C: cqchang@eee.hku.hk	en_HK
dc.identifier.email	Hung, YS: yshung@hkucc.hku.hk	en_HK
dc.identifier.authority	Chang, C=rp00095	en_HK
dc.identifier.authority	Hung, YS=rp00220	en_HK
dc.description.nature	published_or_final_version	en_HK
dc.identifier.doi	10.1186/1471-2105-9-288	en_HK
dc.identifier.pmid	18564431	en_HK
dc.identifier.pmcid	PMC2459189	en_HK
dc.identifier.scopus	eid_2-s2.0-47349103055	en_HK
dc.identifier.hkuros	146246	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-47349103055&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	9	en_HK
dc.identifier.isi	WOS:000257669300001	-
dc.publisher.place	United Kingdom	en_HK
dc.identifier.scopusauthorid	Yao, J=55478507300	en_HK
dc.identifier.scopusauthorid	Chang, C=7407033052	en_HK
dc.identifier.scopusauthorid	Salmi, ML=8896681400	en_HK
dc.identifier.scopusauthorid	Hung, YS=8091656200	en_HK
dc.identifier.scopusauthorid	Loraine, A=6603339553	en_HK
dc.identifier.scopusauthorid	Roux, SJ=7103057924	en_HK
dc.identifier.citeulike	2908848	-
dc.identifier.issnl	1471-2105	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats