File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Stratified random forest for genome-wide association study

TitleStratified random forest for genome-wide association study
Authors
Keywordsstratified sampling
random forest classifier
significant SNP selection
Genome-wide association study
Issue Date2011
Citation
Proceedings - 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011, 2011, p. 10-15 How to Cite?
AbstractFor high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is very time-consuming and not favorable in GWA study for high dimensional data. This paper proposes a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. We employ two genome-wide SNP data sets (Parkinson case control data comprised of 408,803 SNPs and Alzheimer case control data comprised of 380,157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. © 2011 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/276923
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorWu, Qingyao-
dc.contributor.authorYe, Yunming-
dc.contributor.authorLiu, Yang-
dc.contributor.authorNg, Michael-
dc.date.accessioned2019-09-18T08:35:04Z-
dc.date.available2019-09-18T08:35:04Z-
dc.date.issued2011-
dc.identifier.citationProceedings - 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011, 2011, p. 10-15-
dc.identifier.urihttp://hdl.handle.net/10722/276923-
dc.description.abstractFor high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is very time-consuming and not favorable in GWA study for high dimensional data. This paper proposes a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. We employ two genome-wide SNP data sets (Parkinson case control data comprised of 408,803 SNPs and Alzheimer case control data comprised of 380,157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. © 2011 IEEE.-
dc.languageeng-
dc.relation.ispartofProceedings - 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011-
dc.subjectstratified sampling-
dc.subjectrandom forest classifier-
dc.subjectsignificant SNP selection-
dc.subjectGenome-wide association study-
dc.titleStratified random forest for genome-wide association study-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/BIBM.2011.9-
dc.identifier.scopuseid_2-s2.0-84862941007-
dc.identifier.spage10-
dc.identifier.epage15-
dc.identifier.isiWOS:000411330600002-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats