File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Local and global recoding methods for anonymizing set-valued data

TitleLocal and global recoding methods for anonymizing set-valued data
Authors
KeywordsAnonymity
Database privacy
Set-valued data
Issue Date2011
PublisherSpringer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htm
Citation
Vldb Journal, 2011, v. 20 n. 1, p. 83-106 How to Cite?
AbstractIn this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag.
Persistent Identifierhttp://hdl.handle.net/10722/138037
ISSN
2015 Impact Factor: 1.744
2015 SCImago Journal Rankings: 0.899
ISI Accession Number ID
Funding AgencyGrant Number
Hong Kong RGCHKU 715108E
Funding Information:

We would like to thank the authors of [12] for sharing with us the implementation of the Partition algorithm. This work was supported by grant HKU 715108E from Hong Kong RGC.

References
Grants

 

DC FieldValueLanguage
dc.contributor.authorTerrovitis, Men_HK
dc.contributor.authorMamoulis, Nen_HK
dc.contributor.authorKalnis, Pen_HK
dc.date.accessioned2011-08-26T14:39:03Z-
dc.date.available2011-08-26T14:39:03Z-
dc.date.issued2011en_HK
dc.identifier.citationVldb Journal, 2011, v. 20 n. 1, p. 83-106en_HK
dc.identifier.issn1066-8888en_HK
dc.identifier.urihttp://hdl.handle.net/10722/138037-
dc.description.abstractIn this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag.en_HK
dc.languageengen_US
dc.publisherSpringer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htmen_HK
dc.relation.ispartofVLDB Journalen_HK
dc.rightsThe original publication is available at www.springerlink.com-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subjectAnonymityen_HK
dc.subjectDatabase privacyen_HK
dc.subjectSet-valued dataen_HK
dc.titleLocal and global recoding methods for anonymizing set-valued dataen_HK
dc.typeArticleen_HK
dc.identifier.emailMamoulis, N:nikos@cs.hku.hken_HK
dc.identifier.authorityMamoulis, N=rp00155en_HK
dc.description.naturepostprint-
dc.identifier.doi10.1007/s00778-010-0192-8en_HK
dc.identifier.scopuseid_2-s2.0-78751575731en_HK
dc.identifier.hkuros190925en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-78751575731&selection=ref&src=s&origin=recordpageen_HK
dc.identifier.volume20en_HK
dc.identifier.issue1en_HK
dc.identifier.spage83en_HK
dc.identifier.epage106en_HK
dc.identifier.isiWOS:000286432000005-
dc.publisher.placeGermanyen_HK
dc.relation.projectPrivacy Preservation in the Publication of Data Sequences-
dc.identifier.scopusauthoridTerrovitis, M=36907107900en_HK
dc.identifier.scopusauthoridMamoulis, N=6701782749en_HK
dc.identifier.scopusauthoridKalnis, P=6603477534en_HK
dc.identifier.citeulike7324763-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats