File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/s10115-005-0233-6
- Scopus: eid_2-s2.0-33845240405
- WOS: WOS:000243390700003
- Find via
Supplementary
-
Bookmarks:
- CiteULike: 1
- Citations:
- Appears in Collections:
Article: Capabilities of outlier detection schemes in large datasets, framework and methodologies
Title | Capabilities of outlier detection schemes in large datasets, framework and methodologies |
---|---|
Authors | |
Keywords | Connectivity-Based Outliers Density-Based Outliers Distance-Based Outliers Outlier Detection Performance Metrics Scheme Capability |
Issue Date | 2007 |
Publisher | Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/ |
Citation | Knowledge And Information Systems, 2007, v. 11 n. 1, p. 45-84 How to Cite? |
Abstract | Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. © Springer-Verlag London Limited 2006. |
Persistent Identifier | http://hdl.handle.net/10722/152349 |
ISSN | 2023 Impact Factor: 2.5 2023 SCImago Journal Rankings: 0.860 |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tang, J | en_US |
dc.contributor.author | Chen, Z | en_US |
dc.contributor.author | Fu, AW | en_US |
dc.contributor.author | Cheung, DW | en_US |
dc.date.accessioned | 2012-06-26T06:37:25Z | - |
dc.date.available | 2012-06-26T06:37:25Z | - |
dc.date.issued | 2007 | en_US |
dc.identifier.citation | Knowledge And Information Systems, 2007, v. 11 n. 1, p. 45-84 | en_US |
dc.identifier.issn | 0219-1377 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/152349 | - |
dc.description.abstract | Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. © Springer-Verlag London Limited 2006. | en_US |
dc.language | eng | en_US |
dc.publisher | Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/ | en_US |
dc.relation.ispartof | Knowledge and Information Systems | en_US |
dc.subject | Connectivity-Based Outliers | en_US |
dc.subject | Density-Based Outliers | en_US |
dc.subject | Distance-Based Outliers | en_US |
dc.subject | Outlier Detection | en_US |
dc.subject | Performance Metrics | en_US |
dc.subject | Scheme Capability | en_US |
dc.title | Capabilities of outlier detection schemes in large datasets, framework and methodologies | en_US |
dc.type | Article | en_US |
dc.identifier.email | Cheung, DW:dcheung@cs.hku.hk | en_US |
dc.identifier.authority | Cheung, DW=rp00101 | en_US |
dc.description.nature | link_to_subscribed_fulltext | en_US |
dc.identifier.doi | 10.1007/s10115-005-0233-6 | en_US |
dc.identifier.scopus | eid_2-s2.0-33845240405 | en_US |
dc.identifier.hkuros | 135452 | - |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-33845240405&selection=ref&src=s&origin=recordpage | en_US |
dc.identifier.volume | 11 | en_US |
dc.identifier.issue | 1 | en_US |
dc.identifier.spage | 45 | en_US |
dc.identifier.epage | 84 | en_US |
dc.identifier.isi | WOS:000243390700003 | - |
dc.publisher.place | United Kingdom | en_US |
dc.identifier.scopusauthorid | Tang, J=7404637990 | en_US |
dc.identifier.scopusauthorid | Chen, Z=7409485867 | en_US |
dc.identifier.scopusauthorid | Fu, AW=25957576800 | en_US |
dc.identifier.scopusauthorid | Cheung, DW=34567902600 | en_US |
dc.identifier.citeulike | 1003623 | - |
dc.identifier.issnl | 0219-3116 | - |