Decision trees for uncertain data

Tsang, S; Kao, B; Yip, KY; Ho, WS; Lee, SD

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TKDE.2009.175
Scopus: eid_2-s2.0-78649401025
WOS: WOS:000284422700006
Find via

Supplementary

Bookmarks:
- CiteULike: 1
Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Decision trees for uncertain data

Title

Decision trees for uncertain data

Authors

Tsang, S Kao, B Yip, KY Ho, WS Lee, SD

Keywords

classification
data mining
decision tree
Uncertain data

Issue Date

2011

Publisher

IEEE. The Journal's web site is located at http://www.computer.org/tkde

Citation

IEEE Transactions on Knowledge and Data Engineering, 2011, v. 23 n. 1, p. 64-78 How to Cite?

DOI: http://dx.doi.org/10.1109/TKDE.2009.175

Abstract

Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.

Persistent Identifier

http://hdl.handle.net/10722/152447

ISSN

1041-4347

2023 Impact Factor: 8.9

2023 SCImago Journal Rankings: 2.867

ISI Accession Number ID

WOS:000284422700006

Funding Agency	Grant Number
Hong Kong Research Grants Council	HKU 7134/06E

Funding Information:

This research is supported by the Hong Kong Research Grants Council Grant HKU 7134/06E.

References

References in Scopus

Grants

Computational issues in mining uncertain data

DC Field	Value	Language
dc.contributor.author	Tsang, S	en_HK
dc.contributor.author	Kao, B	en_HK
dc.contributor.author	Yip, KY	en_HK
dc.contributor.author	Ho, WS	en_HK
dc.contributor.author	Lee, SD	en_HK
dc.date.accessioned	2012-06-26T06:39:10Z	-
dc.date.available	2012-06-26T06:39:10Z	-
dc.date.issued	2011	en_HK
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2011, v. 23 n. 1, p. 64-78	en_HK
dc.identifier.issn	1041-4347	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/152447	-
dc.description.abstract	Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/ quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.	en_HK
dc.language	eng	en_US
dc.publisher	IEEE. The Journal's web site is located at http://www.computer.org/tkde	en_HK
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_HK
dc.rights	©2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.subject	classification	en_HK
dc.subject	data mining	en_HK
dc.subject	decision tree	en_HK
dc.subject	Uncertain data	en_HK
dc.title	Decision trees for uncertain data	en_HK
dc.type	Article	en_HK
dc.identifier.email	Kao, B: kao@cs.hku.hk	en_HK
dc.identifier.email	Ho, WS: wsho@cs.hku.hk	en_HK
dc.identifier.authority	Kao, B=rp00123	en_HK
dc.identifier.authority	Ho, WS=rp01730	en_HK
dc.description.nature	published_or_final_version	en_US
dc.identifier.doi	10.1109/TKDE.2009.175	en_HK
dc.identifier.scopus	eid_2-s2.0-78649401025	en_HK
dc.identifier.hkuros	186897	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-78649401025&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	23	en_HK
dc.identifier.issue	1	en_HK
dc.identifier.spage	64	en_HK
dc.identifier.epage	78	en_HK
dc.identifier.isi	WOS:000284422700006	-
dc.publisher.place	United States	en_HK
dc.relation.project	Computational issues in mining uncertain data	-
dc.identifier.scopusauthorid	Tsang, S=26666352300	en_HK
dc.identifier.scopusauthorid	Kao, B=35221592600	en_HK
dc.identifier.scopusauthorid	Yip, KY=7101909946	en_HK
dc.identifier.scopusauthorid	Ho, WS=7402968940	en_HK
dc.identifier.scopusauthorid	Lee, SD=7601400741	en_HK
dc.identifier.citeulike	11892932	-
dc.identifier.issnl	1041-4347	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Decision trees for uncertain data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats