File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1214/19-AOAS1312
- Scopus: eid_2-s2.0-85083693939
- WOS: WOS:000527373000020
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies
Title | A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies |
---|---|
Authors | |
Keywords | Dimension reduction Eigen-values Hypothesis testing Minimum p-value test Multiple phenotypes |
Issue Date | 2020 |
Publisher | Institute of Mathematical Statistics. The Journal's web site is located at http://www.imstat.org/aoas/ |
Citation | The Annals of Applied Statistics, 2020, v. 14 n. 1, p. 433-451 How to Cite? |
Abstract | Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum p-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings. |
Persistent Identifier | http://hdl.handle.net/10722/284608 |
ISSN | 2023 Impact Factor: 1.3 2023 SCImago Journal Rankings: 0.954 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Liu, Z | - |
dc.contributor.author | Barnett, I | - |
dc.contributor.author | Lin, X | - |
dc.date.accessioned | 2020-08-07T09:00:05Z | - |
dc.date.available | 2020-08-07T09:00:05Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | The Annals of Applied Statistics, 2020, v. 14 n. 1, p. 433-451 | - |
dc.identifier.issn | 1932-6157 | - |
dc.identifier.uri | http://hdl.handle.net/10722/284608 | - |
dc.description.abstract | Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum p-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings. | - |
dc.language | eng | - |
dc.publisher | Institute of Mathematical Statistics. The Journal's web site is located at http://www.imstat.org/aoas/ | - |
dc.relation.ispartof | The Annals of Applied Statistics | - |
dc.subject | Dimension reduction | - |
dc.subject | Eigen-values | - |
dc.subject | Hypothesis testing | - |
dc.subject | Minimum p-value test | - |
dc.subject | Multiple phenotypes | - |
dc.title | A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies | - |
dc.type | Article | - |
dc.identifier.email | Liu, Z: zhhliu@hku.hk | - |
dc.identifier.authority | Liu, Z=rp02429 | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.1214/19-AOAS1312 | - |
dc.identifier.scopus | eid_2-s2.0-85083693939 | - |
dc.identifier.hkuros | 312168 | - |
dc.identifier.volume | 14 | - |
dc.identifier.issue | 1 | - |
dc.identifier.spage | 433 | - |
dc.identifier.epage | 451 | - |
dc.identifier.isi | WOS:000527373000020 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 1932-6157 | - |