File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Model-based probabilistic frequent itemset mining

TitleModel-based probabilistic frequent itemset mining
Authors
Issue Date2013
PublisherSpringer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/
Citation
Knowledge and Information Systems, 2013, v. 37 n. 1, p. 181-217 How to Cite?
AbstractData uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).
Persistent Identifierhttp://hdl.handle.net/10722/165826
ISSN
2015 Impact Factor: 1.702
2015 SCImago Journal Rankings: 1.094
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorBernecker, Ten_US
dc.contributor.authorCheng, Ren_US
dc.contributor.authorCheung, DWen_US
dc.contributor.authorKriegel, HPen_US
dc.contributor.authorLee, SDen_US
dc.contributor.authorRenz, Men_US
dc.contributor.authorVerhein, Fen_US
dc.contributor.authorWang, Len_US
dc.contributor.authorZuefle, Aen_US
dc.date.accessioned2012-09-20T08:24:20Z-
dc.date.available2012-09-20T08:24:20Z-
dc.date.issued2013en_US
dc.identifier.citationKnowledge and Information Systems, 2013, v. 37 n. 1, p. 181-217en_US
dc.identifier.issn0219-1377-
dc.identifier.urihttp://hdl.handle.net/10722/165826-
dc.description.abstractData uncertainty is inherent in emerging applications such as location-based services, sensor monitoring systems, and data integration. To handle a large amount of imprecise information, uncertain databases have been recently developed. In this paper, we study how to efficiently discover frequent itemsets from large uncertain databases, interpreted under the Possible World Semantics. This is technically challenging, since an uncertain database induces an exponential number of possible worlds. To tackle this problem, we propose a novel methods to capture the itemset mining process as a probability distribution function taking two models into account: the Poisson distribution and the normal distribution. These model-based approaches extract frequent itemsets with a high degree of accuracy and support large databases. We apply our techniques to improve the performance of the algorithms for (1) finding itemsets whose frequentness probabilities are larger than some threshold and (2) mining itemsets with the {Mathematical expression} highest frequentness probabilities. Our approaches support both tuple and attribute uncertainty models, which are commonly used to represent uncertain databases. Extensive evaluation on real and synthetic datasets shows that our methods are highly accurate and four orders of magnitudes faster than previous approaches. In further theoretical and experimental studies, we give an intuition which model-based approach fits best to different types of data sets. © 2012 The Author(s).-
dc.languageengen_US
dc.publisherSpringer-Verlag London Ltd. The Journal's web site is located at http://link.springer.de/link/service/journals/10115/en_US
dc.relation.ispartofKnowledge and Information Systemsen_US
dc.rightsCreative Commons: Attribution 3.0 Hong Kong Licenseen_US
dc.titleModel-based probabilistic frequent itemset miningen_US
dc.typeArticleen_US
dc.identifier.emailBernecker, T: bernecker@dbs.ifi.lmu.deen_US
dc.identifier.emailCheng, R: ckcheng@cs.hku.hken_US
dc.identifier.emailCheung, DW: dcheung@cs.hku.hken_US
dc.identifier.emailKriegel, HP: kriegel@dbs.ifi.lmu.de-
dc.identifier.emailLee, SD: sdlee@cs.hku.hk-
dc.identifier.emailRenz, M: renz@dbs.ifi.lmu.de-
dc.identifier.emailVerhein, F: verhein@dbs.ifi.lmu.de-
dc.identifier.emailWang, L: lwang@cs.hku.hk-
dc.identifier.emailZuefle, A: zuefle@dbs.ifi.lmu.de-
dc.identifier.authorityCheng, CK=rp00074en_US
dc.identifier.authorityCheung, DWL=rp00101en_US
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1007/s10115-012-0561-2-
dc.identifier.scopuseid_2-s2.0-84884587755-
dc.identifier.hkuros206202en_US
dc.identifier.isiWOS:000324652900008-
dc.publisher.placeUnited Kingdom-
dc.identifier.citeulike11545593-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats