Parallel mining of outliers in large database

Hung, E; Cheung, DW

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1023/A:1015608814486
Scopus: eid_2-s2.0-0036644801
WOS: WOS:000175855700001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Parallel mining of outliers in large database

Title	Parallel mining of outliers in large database
Authors	Hung, E Cheung, DW
Keywords	Data mining Outlier detection Parallel algorithm
Issue Date	2002
Publisher	Springer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=0926-8782
Citation	Distributed And Parallel Databases, 2002, v. 12 n. 1, p. 5-26 How to Cite? DOI: http://dx.doi.org/10.1023/A:1015608814486
Abstract	Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database. previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm (Knorr and Ng, in Proc. 24th VLDB, 1998) is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset. A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization Parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. The results show that it ms a very good choice to mine outliers in a cluster of workstations with a low-cost interconnected by a commodity communication network.
Persistent Identifier	http://hdl.handle.net/10722/89005
ISSN	0926-8782 2023 Impact Factor: 1.5 2023 SCImago Journal Rankings: 0.442
ISI Accession Number ID	WOS:000175855700001
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Hung, E	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.date.accessioned	2010-09-06T09:51:12Z	-
dc.date.available	2010-09-06T09:51:12Z	-
dc.date.issued	2002	en_HK
dc.identifier.citation	Distributed And Parallel Databases, 2002, v. 12 n. 1, p. 5-26	en_HK
dc.identifier.issn	0926-8782	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/89005	-
dc.description.abstract	Data mining is a new, important and fast growing database application. Outlier (exception) detection is one kind of data mining, which can be applied in a variety of areas like monitoring of credit card fraud and criminal activities in electronic commerce. With the ever-increasing size and attributes (dimensions) of database. previously proposed detection methods for two dimensions are no longer applicable. The time complexity of the Nested-Loop (NL) algorithm (Knorr and Ng, in Proc. 24th VLDB, 1998) is linear to the dimensionality but quadratic to the dataset size, inducing an unacceptable cost for large dataset. A more efficient version (ENL) and its parallel version (PENL) are introduced. In theory, the improvement of performance in PENL is linear to the number of processors, as shown in a performance comparison between ENL and PENL using Bulk Synchronization Parallel (BSP) model. The great improvement is further verified by experiments on a parallel computer system IBM 9076 SP2. The results show that it ms a very good choice to mine outliers in a cluster of workstations with a low-cost interconnected by a commodity communication network.	en_HK
dc.language	eng	en_HK
dc.publisher	Springer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=0926-8782	en_HK
dc.relation.ispartof	Distributed and Parallel Databases	en_HK
dc.subject	Data mining	en_HK
dc.subject	Outlier detection	en_HK
dc.subject	Parallel algorithm	en_HK
dc.title	Parallel mining of outliers in large database	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0926-8782&volume=12&spage=5&epage=26&date=2002&atitle=Parallel+Mining+of+Outliers+in+Large+Database	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1023/A:1015608814486	en_HK
dc.identifier.scopus	eid_2-s2.0-0036644801	en_HK
dc.identifier.hkuros	70917	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-0036644801&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	12	en_HK
dc.identifier.issue	1	en_HK
dc.identifier.spage	5	en_HK
dc.identifier.epage	26	en_HK
dc.identifier.isi	WOS:000175855700001	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Hung, E=7004256336	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.issnl	0926-8782	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Parallel mining of outliers in large database

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats