File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/s00778-010-0192-8
- Scopus: eid_2-s2.0-78751575731
- WOS: WOS:000286432000005
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Local and global recoding methods for anonymizing set-valued data
Title | Local and global recoding methods for anonymizing set-valued data | ||||
---|---|---|---|---|---|
Authors | |||||
Keywords | Anonymity Database privacy Set-valued data | ||||
Issue Date | 2011 | ||||
Publisher | Springer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htm | ||||
Citation | Vldb Journal, 2011, v. 20 n. 1, p. 83-106 How to Cite? | ||||
Abstract | In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag. | ||||
Persistent Identifier | http://hdl.handle.net/10722/138037 | ||||
ISSN | 2023 Impact Factor: 2.8 2023 SCImago Journal Rankings: 1.853 | ||||
ISI Accession Number ID |
Funding Information: We would like to thank the authors of [12] for sharing with us the implementation of the Partition algorithm. This work was supported by grant HKU 715108E from Hong Kong RGC. | ||||
References | |||||
Grants |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Terrovitis, M | en_HK |
dc.contributor.author | Mamoulis, N | en_HK |
dc.contributor.author | Kalnis, P | en_HK |
dc.date.accessioned | 2011-08-26T14:39:03Z | - |
dc.date.available | 2011-08-26T14:39:03Z | - |
dc.date.issued | 2011 | en_HK |
dc.identifier.citation | Vldb Journal, 2011, v. 20 n. 1, p. 83-106 | en_HK |
dc.identifier.issn | 1066-8888 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/138037 | - |
dc.description.abstract | In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag. | en_HK |
dc.language | eng | en_US |
dc.publisher | Springer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htm | en_HK |
dc.relation.ispartof | VLDB Journal | en_HK |
dc.rights | The original publication is available at www.springerlink.com | - |
dc.subject | Anonymity | en_HK |
dc.subject | Database privacy | en_HK |
dc.subject | Set-valued data | en_HK |
dc.title | Local and global recoding methods for anonymizing set-valued data | en_HK |
dc.type | Article | en_HK |
dc.identifier.email | Mamoulis, N:nikos@cs.hku.hk | en_HK |
dc.identifier.authority | Mamoulis, N=rp00155 | en_HK |
dc.description.nature | postprint | - |
dc.identifier.doi | 10.1007/s00778-010-0192-8 | en_HK |
dc.identifier.scopus | eid_2-s2.0-78751575731 | en_HK |
dc.identifier.hkuros | 190925 | en_US |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-78751575731&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 20 | en_HK |
dc.identifier.issue | 1 | en_HK |
dc.identifier.spage | 83 | en_HK |
dc.identifier.epage | 106 | en_HK |
dc.identifier.isi | WOS:000286432000005 | - |
dc.publisher.place | Germany | en_HK |
dc.relation.project | Privacy Preservation in the Publication of Data Sequences | - |
dc.identifier.scopusauthorid | Terrovitis, M=36907107900 | en_HK |
dc.identifier.scopusauthorid | Mamoulis, N=6701782749 | en_HK |
dc.identifier.scopusauthorid | Kalnis, P=6603477534 | en_HK |
dc.identifier.citeulike | 7324763 | - |
dc.identifier.issnl | 1066-8888 | - |