Model-based probabilistic frequent itemset mining

Bernecker, T; Cheng, R; Cheung, DW; Kriegel, HP; Lee, SD; Renz, M; Verhein, F; Wang, L; Zuefle, A

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s10115-012-0561-2
Scopus: eid_2-s2.0-84884587755
WOS: WOS:000324652900008
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Model-based probabilistic frequent itemset mining

Title	Model-based probabilistic frequent itemset mining
Authors	Bernecker, T Cheng, R Cheung, DW Kriegel, HP Lee, SD Renz, M Verhein, F Wang, L Zuefle, A
Issue Date	2013
Publisher	Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/
Citation	Knowledge and Information Systems, 2013, v. 37 n. 1, p. 181-217 How to Cite? DOI: http://dx.doi.org/10.1007/s10115-012-0561-2
Abstract	Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).
Persistent Identifier	http://hdl.handle.net/10722/165826
ISSN	0219-1377 2023 Impact Factor: 2.5 2023 SCImago Journal Rankings: 0.860
ISI Accession Number ID	WOS:000324652900008

DC Field	Value	Language
dc.contributor.author	Bernecker, T	en_US
dc.contributor.author	Cheng, R	en_US
dc.contributor.author	Cheung, DW	en_US
dc.contributor.author	Kriegel, HP	en_US
dc.contributor.author	Lee, SD	en_US
dc.contributor.author	Renz, M	en_US
dc.contributor.author	Verhein, F	en_US
dc.contributor.author	Wang, L	en_US
dc.contributor.author	Zuefle, A	en_US
dc.date.accessioned	2012-09-20T08:24:20Z	-
dc.date.available	2012-09-20T08:24:20Z	-
dc.date.issued	2013	en_US
dc.identifier.citation	Knowledge and Information Systems, 2013, v. 37 n. 1, p. 181-217	en_US
dc.identifier.issn	0219-1377	-
dc.identifier.uri	http://hdl.handle.net/10722/165826	-
dc.description.abstract	Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).	-
dc.language	eng	en_US
dc.publisher	Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/	en_US
dc.relation.ispartof	Knowledge and Information Systems	en_US
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	en_US
dc.title	Model-based probabilistic frequent itemset mining	en_US
dc.type	Article	en_US
dc.identifier.email	Bernecker, T: bernecker@dbs.ifi.lmu.de	en_US
dc.identifier.email	Cheng, R: ckcheng@cs.hku.hk	en_US
dc.identifier.email	Cheung, DW: dcheung@cs.hku.hk	en_US
dc.identifier.email	Kriegel, HP: kriegel@dbs.ifi.lmu.de	-
dc.identifier.email	Lee, SD: sdlee@cs.hku.hk	-
dc.identifier.email	Renz, M: renz@dbs.ifi.lmu.de	-
dc.identifier.email	Verhein, F: verhein@dbs.ifi.lmu.de	-
dc.identifier.email	Wang, L: lwang@cs.hku.hk	-
dc.identifier.email	Zuefle, A: zuefle@dbs.ifi.lmu.de	-
dc.identifier.authority	Cheng, CK=rp00074	en_US
dc.identifier.authority	Cheung, DWL=rp00101	en_US
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1007/s10115-012-0561-2	-
dc.identifier.scopus	eid_2-s2.0-84884587755	-
dc.identifier.hkuros	206202	en_US
dc.identifier.isi	WOS:000324652900008	-
dc.publisher.place	United Kingdom	-
dc.identifier.citeulike	11545593	-
dc.identifier.issnl	0219-3116	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Model-based probabilistic frequent itemset mining

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats