File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Decision trees for uncertain data

TitleDecision trees for uncertain data
Authors
Keywordsclassification
data mining
decision tree
Uncertain data
Issue Date2011
PublisherIEEE. The Journal's web site is located at http://www.computer.org/tkde
Citation
IEEE Transactions on Knowledge and Data Engineering, 2011, v. 23 n. 1, p. 64-78 How to Cite?
AbstractTraditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.
Persistent Identifierhttp://hdl.handle.net/10722/152447
ISSN
2023 Impact Factor: 8.9
2023 SCImago Journal Rankings: 2.867
ISI Accession Number ID
Funding AgencyGrant Number
Hong Kong Research Grants CouncilHKU 7134/06E
Funding Information:

This research is supported by the Hong Kong Research Grants Council Grant HKU 7134/06E.

References
Grants

 

DC FieldValueLanguage
dc.contributor.authorTsang, Sen_HK
dc.contributor.authorKao, Ben_HK
dc.contributor.authorYip, KYen_HK
dc.contributor.authorHo, WSen_HK
dc.contributor.authorLee, SDen_HK
dc.date.accessioned2012-06-26T06:39:10Z-
dc.date.available2012-06-26T06:39:10Z-
dc.date.issued2011en_HK
dc.identifier.citationIEEE Transactions on Knowledge and Data Engineering, 2011, v. 23 n. 1, p. 64-78en_HK
dc.identifier.issn1041-4347en_HK
dc.identifier.urihttp://hdl.handle.net/10722/152447-
dc.description.abstractTraditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.en_HK
dc.languageengen_US
dc.publisherIEEE. The Journal's web site is located at http://www.computer.org/tkdeen_HK
dc.relation.ispartofIEEE Transactions on Knowledge and Data Engineeringen_HK
dc.rights©2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.-
dc.subjectclassificationen_HK
dc.subjectdata miningen_HK
dc.subjectdecision treeen_HK
dc.subjectUncertain dataen_HK
dc.titleDecision trees for uncertain dataen_HK
dc.typeArticleen_HK
dc.identifier.emailKao, B: kao@cs.hku.hken_HK
dc.identifier.emailHo, WS: wsho@cs.hku.hken_HK
dc.identifier.authorityKao, B=rp00123en_HK
dc.identifier.authorityHo, WS=rp01730en_HK
dc.description.naturepublished_or_final_versionen_US
dc.identifier.doi10.1109/TKDE.2009.175en_HK
dc.identifier.scopuseid_2-s2.0-78649401025en_HK
dc.identifier.hkuros186897-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-78649401025&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume23en_HK
dc.identifier.issue1en_HK
dc.identifier.spage64en_HK
dc.identifier.epage78en_HK
dc.identifier.isiWOS:000284422700006-
dc.publisher.placeUnited Statesen_HK
dc.relation.projectComputational issues in mining uncertain data-
dc.identifier.scopusauthoridTsang, S=26666352300en_HK
dc.identifier.scopusauthoridKao, B=35221592600en_HK
dc.identifier.scopusauthoridYip, KY=7101909946en_HK
dc.identifier.scopusauthoridHo, WS=7402968940en_HK
dc.identifier.scopusauthoridLee, SD=7601400741en_HK
dc.identifier.citeulike11892932-
dc.identifier.issnl1041-4347-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats