File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TKDE.2009.175
- Scopus: eid_2-s2.0-78649401025
- WOS: WOS:000284422700006
- Find via
Supplementary
-
Bookmarks:
- CiteULike: 1
- Citations:
- Appears in Collections:
Article: Decision trees for uncertain data
Title | Decision trees for uncertain data | ||||
---|---|---|---|---|---|
Authors | |||||
Keywords | classification data mining decision tree Uncertain data | ||||
Issue Date | 2011 | ||||
Publisher | IEEE. The Journal's web site is located at http://www.computer.org/tkde | ||||
Citation | IEEE Transactions on Knowledge and Data Engineering, 2011, v. 23 n. 1, p. 64-78 How to Cite? | ||||
Abstract | Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency. | ||||
Persistent Identifier | http://hdl.handle.net/10722/152447 | ||||
ISSN | 2023 Impact Factor: 8.9 2023 SCImago Journal Rankings: 2.867 | ||||
ISI Accession Number ID |
Funding Information: This research is supported by the Hong Kong Research Grants Council Grant HKU 7134/06E. | ||||
References | |||||
Grants |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tsang, S | en_HK |
dc.contributor.author | Kao, B | en_HK |
dc.contributor.author | Yip, KY | en_HK |
dc.contributor.author | Ho, WS | en_HK |
dc.contributor.author | Lee, SD | en_HK |
dc.date.accessioned | 2012-06-26T06:39:10Z | - |
dc.date.available | 2012-06-26T06:39:10Z | - |
dc.date.issued | 2011 | en_HK |
dc.identifier.citation | IEEE Transactions on Knowledge and Data Engineering, 2011, v. 23 n. 1, p. 64-78 | en_HK |
dc.identifier.issn | 1041-4347 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/152447 | - |
dc.description.abstract | Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency. | en_HK |
dc.language | eng | en_US |
dc.publisher | IEEE. The Journal's web site is located at http://www.computer.org/tkde | en_HK |
dc.relation.ispartof | IEEE Transactions on Knowledge and Data Engineering | en_HK |
dc.rights | ©2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | - |
dc.subject | classification | en_HK |
dc.subject | data mining | en_HK |
dc.subject | decision tree | en_HK |
dc.subject | Uncertain data | en_HK |
dc.title | Decision trees for uncertain data | en_HK |
dc.type | Article | en_HK |
dc.identifier.email | Kao, B: kao@cs.hku.hk | en_HK |
dc.identifier.email | Ho, WS: wsho@cs.hku.hk | en_HK |
dc.identifier.authority | Kao, B=rp00123 | en_HK |
dc.identifier.authority | Ho, WS=rp01730 | en_HK |
dc.description.nature | published_or_final_version | en_US |
dc.identifier.doi | 10.1109/TKDE.2009.175 | en_HK |
dc.identifier.scopus | eid_2-s2.0-78649401025 | en_HK |
dc.identifier.hkuros | 186897 | - |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-78649401025&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 23 | en_HK |
dc.identifier.issue | 1 | en_HK |
dc.identifier.spage | 64 | en_HK |
dc.identifier.epage | 78 | en_HK |
dc.identifier.isi | WOS:000284422700006 | - |
dc.publisher.place | United States | en_HK |
dc.relation.project | Computational issues in mining uncertain data | - |
dc.identifier.scopusauthorid | Tsang, S=26666352300 | en_HK |
dc.identifier.scopusauthorid | Kao, B=35221592600 | en_HK |
dc.identifier.scopusauthorid | Yip, KY=7101909946 | en_HK |
dc.identifier.scopusauthorid | Ho, WS=7402968940 | en_HK |
dc.identifier.scopusauthorid | Lee, SD=7601400741 | en_HK |
dc.identifier.citeulike | 11892932 | - |
dc.identifier.issnl | 1041-4347 | - |