File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Capabilities of outlier detection schemes in large datasets, framework and methodologies

TitleCapabilities of outlier detection schemes in large datasets, framework and methodologies
Authors
KeywordsConnectivity-Based Outliers
Density-Based Outliers
Distance-Based Outliers
Outlier Detection
Performance Metrics
Scheme Capability
Issue Date2007
PublisherSpringer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/
Citation
Knowledge And Information Systems, 2007, v. 11 n. 1, p. 45-84 How to Cite?
AbstractOutlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. © Springer-Verlag London Limited 2006.
Persistent Identifierhttp://hdl.handle.net/10722/152349
ISSN
2021 Impact Factor: 2.531
2020 SCImago Journal Rankings: 0.634
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorTang, Jen_US
dc.contributor.authorChen, Zen_US
dc.contributor.authorFu, AWen_US
dc.contributor.authorCheung, DWen_US
dc.date.accessioned2012-06-26T06:37:25Z-
dc.date.available2012-06-26T06:37:25Z-
dc.date.issued2007en_US
dc.identifier.citationKnowledge And Information Systems, 2007, v. 11 n. 1, p. 45-84en_US
dc.identifier.issn0219-1377en_US
dc.identifier.urihttp://hdl.handle.net/10722/152349-
dc.description.abstractOutlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics. © Springer-Verlag London Limited 2006.en_US
dc.languageengen_US
dc.publisherSpringer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/en_US
dc.relation.ispartofKnowledge and Information Systemsen_US
dc.subjectConnectivity-Based Outliersen_US
dc.subjectDensity-Based Outliersen_US
dc.subjectDistance-Based Outliersen_US
dc.subjectOutlier Detectionen_US
dc.subjectPerformance Metricsen_US
dc.subjectScheme Capabilityen_US
dc.titleCapabilities of outlier detection schemes in large datasets, framework and methodologiesen_US
dc.typeArticleen_US
dc.identifier.emailCheung, DW:dcheung@cs.hku.hken_US
dc.identifier.authorityCheung, DW=rp00101en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.doi10.1007/s10115-005-0233-6en_US
dc.identifier.scopuseid_2-s2.0-33845240405en_US
dc.identifier.hkuros135452-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-33845240405&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume11en_US
dc.identifier.issue1en_US
dc.identifier.spage45en_US
dc.identifier.epage84en_US
dc.identifier.isiWOS:000243390700003-
dc.publisher.placeUnited Kingdomen_US
dc.identifier.scopusauthoridTang, J=7404637990en_US
dc.identifier.scopusauthoridChen, Z=7409485867en_US
dc.identifier.scopusauthoridFu, AW=25957576800en_US
dc.identifier.scopusauthoridCheung, DW=34567902600en_US
dc.identifier.citeulike1003623-
dc.identifier.issnl0219-3116-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats