File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/s10115-012-0561-2
- Scopus: eid_2-s2.0-84884587755
- WOS: WOS:000324652900008
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Model-based probabilistic frequent itemset mining
Title | Model-based probabilistic frequent itemset mining |
---|---|
Authors | |
Issue Date | 2013 |
Publisher | Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/ |
Citation | Knowledge and Information Systems, 2013, v. 37 n. 1, p. 181-217 How to Cite? |
Abstract | Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s). |
Persistent Identifier | http://hdl.handle.net/10722/165826 |
ISSN | 2023 Impact Factor: 2.5 2023 SCImago Journal Rankings: 0.860 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Bernecker, T | en_US |
dc.contributor.author | Cheng, R | en_US |
dc.contributor.author | Cheung, DW | en_US |
dc.contributor.author | Kriegel, HP | en_US |
dc.contributor.author | Lee, SD | en_US |
dc.contributor.author | Renz, M | en_US |
dc.contributor.author | Verhein, F | en_US |
dc.contributor.author | Wang, L | en_US |
dc.contributor.author | Zuefle, A | en_US |
dc.date.accessioned | 2012-09-20T08:24:20Z | - |
dc.date.available | 2012-09-20T08:24:20Z | - |
dc.date.issued | 2013 | en_US |
dc.identifier.citation | Knowledge and Information Systems, 2013, v. 37 n. 1, p. 181-217 | en_US |
dc.identifier.issn | 0219-1377 | - |
dc.identifier.uri | http://hdl.handle.net/10722/165826 | - |
dc.description.abstract | Data uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s). | - |
dc.language | eng | en_US |
dc.publisher | Springer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/ | en_US |
dc.relation.ispartof | Knowledge and Information Systems | en_US |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | en_US |
dc.title | Model-based probabilistic frequent itemset mining | en_US |
dc.type | Article | en_US |
dc.identifier.email | Bernecker, T: bernecker@dbs.ifi.lmu.de | en_US |
dc.identifier.email | Cheng, R: ckcheng@cs.hku.hk | en_US |
dc.identifier.email | Cheung, DW: dcheung@cs.hku.hk | en_US |
dc.identifier.email | Kriegel, HP: kriegel@dbs.ifi.lmu.de | - |
dc.identifier.email | Lee, SD: sdlee@cs.hku.hk | - |
dc.identifier.email | Renz, M: renz@dbs.ifi.lmu.de | - |
dc.identifier.email | Verhein, F: verhein@dbs.ifi.lmu.de | - |
dc.identifier.email | Wang, L: lwang@cs.hku.hk | - |
dc.identifier.email | Zuefle, A: zuefle@dbs.ifi.lmu.de | - |
dc.identifier.authority | Cheng, CK=rp00074 | en_US |
dc.identifier.authority | Cheung, DWL=rp00101 | en_US |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.1007/s10115-012-0561-2 | - |
dc.identifier.scopus | eid_2-s2.0-84884587755 | - |
dc.identifier.hkuros | 206202 | en_US |
dc.identifier.isi | WOS:000324652900008 | - |
dc.publisher.place | United Kingdom | - |
dc.identifier.citeulike | 11545593 | - |
dc.identifier.issnl | 0219-3116 | - |