File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Effect of data skewness and workload balance in parallel data mining

TitleEffect of data skewness and workload balance in parallel data mining
Authors
KeywordsAssociation rules
Data mining
Data skewness
Parallel mining
Partitioning
Workload balance
Issue Date2002
PublisherI E E E. The Journal's web site is located at http://www.computer.org/tkde
Citation
Ieee Transactions On Knowledge And Data Engineering, 2002, v. 14 n. 3, p. 498-514 How to Cite?
Abstract
To mine association rules efficiently, we have developed a new parallel mining algorithm FPM on a distributed share-nothing parallel system in which data are partitioned across the processors. FPM is an enhancement of the FDM algorithm, which we previously proposed for distributed mining of association rules. FPM requires fewer rounds of message exchanges than FDM and, hence, has a better response time in a parallel environment. The algorithm has been experimentally found to outperform CD, a representative parallel algorithm for the same goal. The efficiency of FPM is attributed to the incorporation of two powerful candidate sets pruning techniques: distributed and global prunings. The two techniques are sensitive to two data distribution characteristics, data skewness, and workload balance. Metrics based on entropy are proposed for these two characteristics. The prunings are very effective when both the skewness and balance are high. In order to increase the efficiency of FPM, we have developed methods to partition a database so that the resulting partitions have high balance and skewness. Experiments have shown empirically that our partitioning algorithms can achieve these aims very well, in particular, the results are consistently better than a random partitioning. Moreover, the partitioning algorithms incur little overhead. So, using our partitioning algorithms and FPM together, we can mine association rules from a database efficiently.
Persistent Identifierhttp://hdl.handle.net/10722/43659
ISSN
2013 Impact Factor: 1.815
2013 SCImago Journal Rankings: 1.763
ISI Accession Number ID
References

 

DC FieldValueLanguage
dc.contributor.authorCheung, DWen_HK
dc.contributor.authorLee, SDen_HK
dc.contributor.authorXiao, Yen_HK
dc.date.accessioned2007-03-23T04:51:26Z-
dc.date.available2007-03-23T04:51:26Z-
dc.date.issued2002en_HK
dc.identifier.citationIeee Transactions On Knowledge And Data Engineering, 2002, v. 14 n. 3, p. 498-514en_HK
dc.identifier.issn1041-4347en_HK
dc.identifier.urihttp://hdl.handle.net/10722/43659-
dc.description.abstractTo mine association rules efficiently, we have developed a new parallel mining algorithm FPM on a distributed share-nothing parallel system in which data are partitioned across the processors. FPM is an enhancement of the FDM algorithm, which we previously proposed for distributed mining of association rules. FPM requires fewer rounds of message exchanges than FDM and, hence, has a better response time in a parallel environment. The algorithm has been experimentally found to outperform CD, a representative parallel algorithm for the same goal. The efficiency of FPM is attributed to the incorporation of two powerful candidate sets pruning techniques: distributed and global prunings. The two techniques are sensitive to two data distribution characteristics, data skewness, and workload balance. Metrics based on entropy are proposed for these two characteristics. The prunings are very effective when both the skewness and balance are high. In order to increase the efficiency of FPM, we have developed methods to partition a database so that the resulting partitions have high balance and skewness. Experiments have shown empirically that our partitioning algorithms can achieve these aims very well, in particular, the results are consistently better than a random partitioning. Moreover, the partitioning algorithms incur little overhead. So, using our partitioning algorithms and FPM together, we can mine association rules from a database efficiently.en_HK
dc.format.extent434493 bytes-
dc.format.extent26624 bytes-
dc.format.mimetypeapplication/pdf-
dc.format.mimetypeapplication/msword-
dc.languageengen_HK
dc.publisherI E E E. The Journal's web site is located at http://www.computer.org/tkdeen_HK
dc.relation.ispartofIEEE Transactions on Knowledge and Data Engineeringen_HK
dc.rights©2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en_HK
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subjectAssociation rulesen_HK
dc.subjectData miningen_HK
dc.subjectData skewnessen_HK
dc.subjectParallel miningen_HK
dc.subjectPartitioningen_HK
dc.subjectWorkload balanceen_HK
dc.titleEffect of data skewness and workload balance in parallel data miningen_HK
dc.typeArticleen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1041-4347&volume=14&issue=3&spage=498&epage=514&date=2002&atitle=Effect+of+data+skewness+and+workload+balance+in+parallel+data+miningen_HK
dc.identifier.emailCheung, DW:dcheung@cs.hku.hken_HK
dc.identifier.authorityCheung, DW=rp00101en_HK
dc.description.naturepublished_or_final_versionen_HK
dc.identifier.doi10.1109/TKDE.2002.1000339en_HK
dc.identifier.scopuseid_2-s2.0-0036565561en_HK
dc.identifier.hkuros70955-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-0036565561&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume14en_HK
dc.identifier.issue3en_HK
dc.identifier.spage498en_HK
dc.identifier.epage514en_HK
dc.identifier.isiWOS:000175317300003-
dc.publisher.placeUnited Statesen_HK
dc.identifier.scopusauthoridCheung, DW=34567902600en_HK
dc.identifier.scopusauthoridLee, SD=37056848600en_HK
dc.identifier.scopusauthoridXiao, Y=22735880100en_HK
dc.identifier.citeulike8355097-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats