File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Stratified sampling for feature subspace selection in random forests for high dimensional data

TitleStratified sampling for feature subspace selection in random forests for high dimensional data
Authors
KeywordsClassification
Ensemble classifier
High-dimensional data
Stratified sampling
Random forests
Decision trees
Issue Date2013
Citation
Pattern Recognition, 2013, v. 46, n. 3, p. 769-787 How to Cite?
AbstractFor high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups. One group will contain strong informative features and the other weak informative features. Then, for feature subspace selection, we randomly select features from each group proportionally. The advantage of stratified sampling is that we can ensure that each subspace contains enough informative features for classification in high dimensional data. Testing on both synthetic data and various real data sets in gene classification, image categorization and face recognition data sets consistently demonstrates the effectiveness of this new method. The performance is shown to better that of state-of-the-art algorithms including SVM, the four variants of random forests (RF, ERT, enrich-RF, and oblique-RF), and nearest neighbor (NN) algorithms. © 2012 Elsevier Ltd.
Persistent Identifierhttp://hdl.handle.net/10722/276943
ISSN
2023 Impact Factor: 7.5
2023 SCImago Journal Rankings: 2.732
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorYe, Yunming-
dc.contributor.authorWu, Qingyao-
dc.contributor.authorZhexue Huang, Joshua-
dc.contributor.authorNg, Michael K.-
dc.contributor.authorLi, Xutao-
dc.date.accessioned2019-09-18T08:35:07Z-
dc.date.available2019-09-18T08:35:07Z-
dc.date.issued2013-
dc.identifier.citationPattern Recognition, 2013, v. 46, n. 3, p. 769-787-
dc.identifier.issn0031-3203-
dc.identifier.urihttp://hdl.handle.net/10722/276943-
dc.description.abstractFor high dimensional data a large portion of features are often not informative of the class of the objects. Random forest algorithms tend to use a simple random sampling of features in building their decision trees and consequently select many subspaces that contain few, if any, informative features. In this paper we propose a stratified sampling method to select the feature subspaces for random forests with high dimensional data. The key idea is to stratify features into two groups. One group will contain strong informative features and the other weak informative features. Then, for feature subspace selection, we randomly select features from each group proportionally. The advantage of stratified sampling is that we can ensure that each subspace contains enough informative features for classification in high dimensional data. Testing on both synthetic data and various real data sets in gene classification, image categorization and face recognition data sets consistently demonstrates the effectiveness of this new method. The performance is shown to better that of state-of-the-art algorithms including SVM, the four variants of random forests (RF, ERT, enrich-RF, and oblique-RF), and nearest neighbor (NN) algorithms. © 2012 Elsevier Ltd.-
dc.languageeng-
dc.relation.ispartofPattern Recognition-
dc.subjectClassification-
dc.subjectEnsemble classifier-
dc.subjectHigh-dimensional data-
dc.subjectStratified sampling-
dc.subjectRandom forests-
dc.subjectDecision trees-
dc.titleStratified sampling for feature subspace selection in random forests for high dimensional data-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/j.patcog.2012.09.005-
dc.identifier.scopuseid_2-s2.0-84870244637-
dc.identifier.volume46-
dc.identifier.issue3-
dc.identifier.spage769-
dc.identifier.epage787-
dc.identifier.isiWOS:000313385700014-
dc.identifier.issnl0031-3203-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats