File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/S0306-4379(01)00046-1
- Scopus: eid_2-s2.0-0036498208
- WOS: WOS:000174193000003
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: A lattice-based approach for I/O efficient association rule mining
Title | A lattice-based approach for I/O efficient association rule mining |
---|---|
Authors | |
Keywords | Apriori Association rules Data mining FindLarge Lattice LGen |
Issue Date | 2002 |
Publisher | Pergamon. The Journal's web site is located at http://www.elsevier.com/locate/is |
Citation | Information Systems, 2002, v. 27 n. 1, p. 41-74 How to Cite? |
Abstract | Most algorithms for association rule mining are variants of the basic Apriori algorithm (Agarwal and Srikant, Fast algorithms for mining association rules in databases, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Santiago, Chile, 1994, pp. 487-499). One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds, with the size of the itemsets incremented by one per round. The number of database scans required by Apriori-based algorithms thus depends on the size of the biggest frequent itemsets. In this paper, we devise a more general candidate set generation algorithm, LGen, which generates candidate itemsets of multiple sizes during each database scan. We present an algorithm FindLarge which uses LGen to find frequent itemsets. We show that, given a reasonable set of suggested frequent itemsets, FindLarge can significantly reduce the number of I/O passes required. In the best cases, only two passes are sufficient to discover all the frequent itemsets irrespective of the size of the biggest ones. Two I/O-saving algorithms, namely DIC and Pincher-Search, are compared with FindLarge in a series of experiments. We discuss the conditions under which FindLarge significantly outperforms the others in terms of I/O efficiency. © 2002 Elsevier Science Ltd. All rights reserved. |
Persistent Identifier | http://hdl.handle.net/10722/89013 |
ISSN | 2023 Impact Factor: 3.0 2023 SCImago Journal Rankings: 1.201 |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Loo, KK | en_HK |
dc.contributor.author | Yip, CL | en_HK |
dc.contributor.author | Kao, B | en_HK |
dc.contributor.author | Cheung, D | en_HK |
dc.date.accessioned | 2010-09-06T09:51:18Z | - |
dc.date.available | 2010-09-06T09:51:18Z | - |
dc.date.issued | 2002 | en_HK |
dc.identifier.citation | Information Systems, 2002, v. 27 n. 1, p. 41-74 | en_HK |
dc.identifier.issn | 0306-4379 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/89013 | - |
dc.description.abstract | Most algorithms for association rule mining are variants of the basic Apriori algorithm (Agarwal and Srikant, Fast algorithms for mining association rules in databases, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Santiago, Chile, 1994, pp. 487-499). One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds, with the size of the itemsets incremented by one per round. The number of database scans required by Apriori-based algorithms thus depends on the size of the biggest frequent itemsets. In this paper, we devise a more general candidate set generation algorithm, LGen, which generates candidate itemsets of multiple sizes during each database scan. We present an algorithm FindLarge which uses LGen to find frequent itemsets. We show that, given a reasonable set of suggested frequent itemsets, FindLarge can significantly reduce the number of I/O passes required. In the best cases, only two passes are sufficient to discover all the frequent itemsets irrespective of the size of the biggest ones. Two I/O-saving algorithms, namely DIC and Pincher-Search, are compared with FindLarge in a series of experiments. We discuss the conditions under which FindLarge significantly outperforms the others in terms of I/O efficiency. © 2002 Elsevier Science Ltd. All rights reserved. | en_HK |
dc.language | eng | en_HK |
dc.publisher | Pergamon. The Journal's web site is located at http://www.elsevier.com/locate/is | en_HK |
dc.relation.ispartof | Information Systems | en_HK |
dc.subject | Apriori | en_HK |
dc.subject | Association rules | en_HK |
dc.subject | Data mining | en_HK |
dc.subject | FindLarge | en_HK |
dc.subject | Lattice | en_HK |
dc.subject | LGen | en_HK |
dc.title | A lattice-based approach for I/O efficient association rule mining | en_HK |
dc.type | Article | en_HK |
dc.identifier.openurl | http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0306-4379&volume=27&issue=1&spage=41&epage=74&date=2002&atitle=A+Lattice-Based+Approach+for+I/O+Efficient+Association+Rule+Mining | en_HK |
dc.identifier.email | Yip, CL:clyip@cs.hku.hk | en_HK |
dc.identifier.email | Kao, B:kao@cs.hku.hk | en_HK |
dc.identifier.email | Cheung, D:dcheung@cs.hku.hk | en_HK |
dc.identifier.authority | Yip, CL=rp00205 | en_HK |
dc.identifier.authority | Kao, B=rp00123 | en_HK |
dc.identifier.authority | Cheung, D=rp00101 | en_HK |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1016/S0306-4379(01)00046-1 | en_HK |
dc.identifier.scopus | eid_2-s2.0-0036498208 | en_HK |
dc.identifier.hkuros | 67151 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-0036498208&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 27 | en_HK |
dc.identifier.issue | 1 | en_HK |
dc.identifier.spage | 41 | en_HK |
dc.identifier.epage | 74 | en_HK |
dc.identifier.isi | WOS:000174193000003 | - |
dc.publisher.place | United Kingdom | en_HK |
dc.identifier.scopusauthorid | Loo, KK=36793892100 | en_HK |
dc.identifier.scopusauthorid | Yip, CL=7101665547 | en_HK |
dc.identifier.scopusauthorid | Kao, B=35221592600 | en_HK |
dc.identifier.scopusauthorid | Cheung, D=34567902600 | en_HK |
dc.identifier.issnl | 0306-4379 | - |