Effect of Data Skewness in Parallel Mining of Association Rules

Cheung, DWL; Xiao, Y

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/3-540-64383-4_5
Scopus: eid_2-s2.0-84958976005
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Effect of Data Skewness in Parallel Mining of Association Rules

Title	Effect of Data Skewness in Parallel Mining of Association Rules
Authors	Cheung, DWL Xiao, Y
Keywords	Association Rules Data Mining Data Skewness Parallel Computing
Issue Date	1998
Publisher	Springer.
Citation	The 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, 15-17 April 1998. In Wu, X, Kotagiri, R and Korb, KB (Eds). Pacific-Asia Conference on Knowledge Discovery and Data Mining, p. 48-60. Berlin, Heidelberg: Springer, 1998 How to Cite? DOI: http://dx.doi.org/10.1007/3-540-64383-4_5
Abstract	An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.
Persistent Identifier	http://hdl.handle.net/10722/93133
ISBN	978-3-540-64383-8
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606

DC Field	Value	Language
dc.contributor.author	Cheung, DWL	en_HK
dc.contributor.author	Xiao, Y	en_HK
dc.date.accessioned	2010-09-25T14:51:53Z	-
dc.date.available	2010-09-25T14:51:53Z	-
dc.date.issued	1998	en_HK
dc.identifier.citation	The 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, 15-17 April 1998. In Wu, X, Kotagiri, R and Korb, KB (Eds). Pacific-Asia Conference on Knowledge Discovery and Data Mining, p. 48-60. Berlin, Heidelberg: Springer, 1998	-
dc.identifier.isbn	978-3-540-64383-8	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10722/93133	-
dc.description.abstract	An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.	-
dc.language	eng	en_HK
dc.publisher	Springer.	-
dc.relation.ispartof	Pacific-Asia Conference on Knowledge Discovery and Data Mining	en_HK
dc.subject	Association Rules	-
dc.subject	Data Mining	-
dc.subject	Data Skewness	-
dc.subject	Parallel Computing	-
dc.title	Effect of Data Skewness in Parallel Mining of Association Rules	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Cheung, DWL: dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Cheung, DWL=rp00101	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/3-540-64383-4_5	-
dc.identifier.scopus	eid_2-s2.0-84958976005	-
dc.identifier.hkuros	31077	en_HK
dc.identifier.eissn	1611-3349	-
dc.identifier.issnl	0302-9743	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Effect of Data Skewness in Parallel Mining of Association Rules

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats