Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes

Pang, H; Jung, S-H

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1002/gepi.21721
Scopus: eid_2-s2.0-84875647843
PMID: 23471879
WOS: WOS:000316810600006
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Public Health: Journal/Magazine Articles

Article: Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes

Title	Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes
Authors	Pang, H Jung, S-H
Keywords	Gene expression GWAS High-dimensional data Prediction validation Sample size Survival
Issue Date	2013
Citation	Genetic Epidemiology, 2013, v. 37 n. 3, p. 276-282 How to Cite? DOI: http://dx.doi.org/10.1002/gepi.21721
Abstract	A variety of prediction methods are used to relate high-dimensional genome data with a clinical outcome using a prediction model. Once a prediction model is developed from a data set, it should be validated using a resampling method or an independent data set. Although the existing prediction methods have been intensively evaluated by many investigators, there has not been a comprehensive study investigating the performance of the validation methods, especially with a survival clinical outcome. Understanding the properties of the various validation methods can allow researchers to perform more powerful validations while controlling for type I error. In addition, sample size calculation strategy based on these validation methods is lacking. We conduct extensive simulations to examine the statistical properties of these validation strategies. In both simulations and a real data example, we have found that 10-fold cross-validation with permutation gave the best power while controlling type I error close to the nominal level. Based on this, we have also developed a sample size calculation method that will be used to design a validation study with a user-chosen combination of prediction. Microarray and genome-wide association studies data are used as illustrations. The power calculation method in this presentation can be used for the design of any biomedical studies involving high-dimensional data and survival outcomes. © 2013 Wiley Periodicals, Inc.
Persistent Identifier	http://hdl.handle.net/10722/194384
ISSN	0741-0395 2023 Impact Factor: 1.7 2023 SCImago Journal Rankings: 0.977
ISI Accession Number ID	WOS:000316810600006

DC Field	Value	Language
dc.contributor.author	Pang, H	-
dc.contributor.author	Jung, S-H	-
dc.date.accessioned	2014-01-30T03:32:31Z	-
dc.date.available	2014-01-30T03:32:31Z	-
dc.date.issued	2013	-
dc.identifier.citation	Genetic Epidemiology, 2013, v. 37 n. 3, p. 276-282	-
dc.identifier.issn	0741-0395	-
dc.identifier.uri	http://hdl.handle.net/10722/194384	-
dc.description.abstract	A variety of prediction methods are used to relate high-dimensional genome data with a clinical outcome using a prediction model. Once a prediction model is developed from a data set, it should be validated using a resampling method or an independent data set. Although the existing prediction methods have been intensively evaluated by many investigators, there has not been a comprehensive study investigating the performance of the validation methods, especially with a survival clinical outcome. Understanding the properties of the various validation methods can allow researchers to perform more powerful validations while controlling for type I error. In addition, sample size calculation strategy based on these validation methods is lacking. We conduct extensive simulations to examine the statistical properties of these validation strategies. In both simulations and a real data example, we have found that 10-fold cross-validation with permutation gave the best power while controlling type I error close to the nominal level. Based on this, we have also developed a sample size calculation method that will be used to design a validation study with a user-chosen combination of prediction. Microarray and genome-wide association studies data are used as illustrations. The power calculation method in this presentation can be used for the design of any biomedical studies involving high-dimensional data and survival outcomes. © 2013 Wiley Periodicals, Inc.	-
dc.language	eng	-
dc.relation.ispartof	Genetic Epidemiology	-
dc.subject	Gene expression	-
dc.subject	GWAS	-
dc.subject	High-dimensional data	-
dc.subject	Prediction validation	-
dc.subject	Sample size	-
dc.subject	Survival	-
dc.title	Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1002/gepi.21721	-
dc.identifier.pmid	23471879	-
dc.identifier.scopus	eid_2-s2.0-84875647843	-
dc.identifier.volume	37	-
dc.identifier.issue	3	-
dc.identifier.spage	276	-
dc.identifier.epage	282	-
dc.identifier.isi	WOS:000316810600006	-
dc.identifier.issnl	0741-0395	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Sample size considerations of prediction-validation methods in high-dimensional data for survival outcomes

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats