Efficient algorithms for mining and incremental update of maximal frequent sequences

Kao, B; Zhang, M; Yip, CL; Cheung, DW

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s10618-005-0268-z
Scopus: eid_2-s2.0-24044460806
WOS: WOS:000228970700001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Efficient algorithms for mining and incremental update of maximal frequent sequences

Title	Efficient algorithms for mining and incremental update of maximal frequent sequences
Authors	Kao, B Zhang, M Yip, CL Cheung, DW
Keywords	Data mining Incremental update Sequence
Issue Date	2005
Publisher	Springer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=1384-5810
Citation	Data Mining And Knowledge Discovery, 2005, v. 10 n. 2, p. 87-116 How to Cite? DOI: http://dx.doi.org/10.1007/s10618-005-0268-z
Abstract	We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large memory requirement of Pref ixSpan, SPADE, and ISM, and evaluate the performance of GSP. We discuss the high I/O cost of GSP, particularly when the database contains long frequent sequences. To reduce the I/O requirement, we propose an algorithm MFS, which could be considered as a generalization of GSP. The general strategy of MFS is to first find an approximate solution to the set of frequent sequences and then perform successive refinement until the exact set of frequent sequences is obtained. We show that this successive refinement approach results in a significant improvement in I/O cost. We discuss how MFS can be applied to the incremental update problem. In particular, the result of a previous mining exercise can be used (by MFS) as a good initial approximate solution for the mining of an updated database. This results in an I/O efficient algorithm. To improve processing efficiency, we devise pruning techniques that, when coupled with GSP or MFS, result in algorithms that are both CPU and I/O efficient. © 2005 Springer Science + Business Media, Inc.
Persistent Identifier	http://hdl.handle.net/10722/88972
ISSN	1384-5810 2023 Impact Factor: 2.8 2023 SCImago Journal Rankings: 1.813
ISI Accession Number ID	WOS:000228970700001
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Kao, B	en_HK
dc.contributor.author	Zhang, M	en_HK
dc.contributor.author	Yip, CL	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.date.accessioned	2010-09-06T09:50:47Z	-
dc.date.available	2010-09-06T09:50:47Z	-
dc.date.issued	2005	en_HK
dc.identifier.citation	Data Mining And Knowledge Discovery, 2005, v. 10 n. 2, p. 87-116	en_HK
dc.identifier.issn	1384-5810	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/88972	-
dc.description.abstract	We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large memory requirement of Pref ixSpan, SPADE, and ISM, and evaluate the performance of GSP. We discuss the high I/O cost of GSP, particularly when the database contains long frequent sequences. To reduce the I/O requirement, we propose an algorithm MFS, which could be considered as a generalization of GSP. The general strategy of MFS is to first find an approximate solution to the set of frequent sequences and then perform successive refinement until the exact set of frequent sequences is obtained. We show that this successive refinement approach results in a significant improvement in I/O cost. We discuss how MFS can be applied to the incremental update problem. In particular, the result of a previous mining exercise can be used (by MFS) as a good initial approximate solution for the mining of an updated database. This results in an I/O efficient algorithm. To improve processing efficiency, we devise pruning techniques that, when coupled with GSP or MFS, result in algorithms that are both CPU and I/O efficient. © 2005 Springer Science + Business Media, Inc.	en_HK
dc.language	eng	en_HK
dc.publisher	Springer New York LLC. The Journal's web site is located at http://springerlink.metapress.com/openurl.asp?genre=journal&issn=1384-5810	en_HK
dc.relation.ispartof	Data Mining and Knowledge Discovery	en_HK
dc.subject	Data mining	en_HK
dc.subject	Incremental update	en_HK
dc.subject	Sequence	en_HK
dc.title	Efficient algorithms for mining and incremental update of maximal frequent sequences	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1384-5810&volume=10&spage=87&epage=116&date=2005&atitle=Efficient+Algorithms+for+Mining+and+Incremental+update+of+Maximal+Frequent+Sequences	en_HK
dc.identifier.email	Kao, B:kao@cs.hku.hk	en_HK
dc.identifier.email	Yip, CL:clyip@cs.hku.hk	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Kao, B=rp00123	en_HK
dc.identifier.authority	Yip, CL=rp00205	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/s10618-005-0268-z	en_HK
dc.identifier.scopus	eid_2-s2.0-24044460806	en_HK
dc.identifier.hkuros	129357	en_HK
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-24044460806&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	10	en_HK
dc.identifier.issue	2	en_HK
dc.identifier.spage	87	en_HK
dc.identifier.epage	116	en_HK
dc.identifier.isi	WOS:000228970700001	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Kao, B=35221592600	en_HK
dc.identifier.scopusauthorid	Zhang, M=20434954000	en_HK
dc.identifier.scopusauthorid	Yip, CL=7101665547	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.citeulike	196576	-
dc.identifier.issnl	1384-5810	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Efficient algorithms for mining and incremental update of maximal frequent sequences

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats