Capabilities of outlier detection schemes in large datasets, framework and methodologies

Tang, J; Chen, Z; Fu, AW; Cheung, DW

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s10115-005-0233-6
Scopus: eid_2-s2.0-33845240405
WOS: WOS:000243390700003
Find via

Supplementary

Bookmarks:
- CiteULike: 1
Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Capabilities of outlier detection schemes in large datasets, framework and methodologies

Title	Capabilities of outlier detection schemes in large datasets, framework and methodologies
Authors	Tang, J Chen, Z Fu, AW Cheung, DW
Keywords	Connectivity-Based Outliers Density-Based Outliers Distance-Based Outliers Outlier Detection Performance Metrics Scheme Capability
Issue Date	2007
Publisher	Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/
Citation	Knowledge And Information Systems, 2007, v. 11 n. 1, p. 45-84 How to Cite? DOI: http://dx.doi.org/10.1007/s10115-005-0233-6
Abstract	Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. © Springer-Verlag London Limited 2006.
Persistent Identifier	http://hdl.handle.net/10722/152349
ISSN	0219-1377 2023 Impact Factor: 2.5 2023 SCImago Journal Rankings: 0.860
ISI Accession Number ID	WOS:000243390700003
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Tang, J	en_US
dc.contributor.author	Chen, Z	en_US
dc.contributor.author	Fu, AW	en_US
dc.contributor.author	Cheung, DW	en_US
dc.date.accessioned	2012-06-26T06:37:25Z	-
dc.date.available	2012-06-26T06:37:25Z	-
dc.date.issued	2007	en_US
dc.identifier.citation	Knowledge And Information Systems, 2007, v. 11 n. 1, p. 45-84	en_US
dc.identifier.issn	0219-1377	en_US
dc.identifier.uri	http://hdl.handle.net/10722/152349	-
dc.description.abstract	Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. © Springer-Verlag London Limited 2006.	en_US
dc.language	eng	en_US
dc.publisher	Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/	en_US
dc.relation.ispartof	Knowledge and Information Systems	en_US
dc.subject	Connectivity-Based Outliers	en_US
dc.subject	Density-Based Outliers	en_US
dc.subject	Distance-Based Outliers	en_US
dc.subject	Outlier Detection	en_US
dc.subject	Performance Metrics	en_US
dc.subject	Scheme Capability	en_US
dc.title	Capabilities of outlier detection schemes in large datasets, framework and methodologies	en_US
dc.type	Article	en_US
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_US
dc.identifier.authority	Cheung, DW=rp00101	en_US
dc.description.nature	link_to_subscribed_fulltext	en_US
dc.identifier.doi	10.1007/s10115-005-0233-6	en_US
dc.identifier.scopus	eid_2-s2.0-33845240405	en_US
dc.identifier.hkuros	135452	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-33845240405&selection=ref&src=s&origin=recordpage	en_US
dc.identifier.volume	11	en_US
dc.identifier.issue	1	en_US
dc.identifier.spage	45	en_US
dc.identifier.epage	84	en_US
dc.identifier.isi	WOS:000243390700003	-
dc.publisher.place	United Kingdom	en_US
dc.identifier.scopusauthorid	Tang, J=7404637990	en_US
dc.identifier.scopusauthorid	Chen, Z=7409485867	en_US
dc.identifier.scopusauthorid	Fu, AW=25957576800	en_US
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_US
dc.identifier.citeulike	1003623	-
dc.identifier.issnl	0219-3116	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Capabilities of outlier detection schemes in large datasets, framework and methodologies

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats