Local and global recoding methods for anonymizing set-valued data

Terrovitis, M; Mamoulis, N; Kalnis, P

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s00778-010-0192-8
Scopus: eid_2-s2.0-78751575731
WOS: WOS:000286432000005
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Local and global recoding methods for anonymizing set-valued data

Title

Local and global recoding methods for anonymizing set-valued data

Authors

Terrovitis, M Mamoulis, N Kalnis, P

Keywords

Anonymity
Database privacy
Set-valued data

Issue Date

2011

Publisher

Springer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htm

Citation

Vldb Journal, 2011, v. 20 n. 1, p. 83-106 How to Cite?

DOI: http://dx.doi.org/10.1007/s00778-010-0192-8

Abstract

In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag.

Persistent Identifier

http://hdl.handle.net/10722/138037

ISSN

1066-8888

2023 Impact Factor: 2.8

2023 SCImago Journal Rankings: 1.853

ISI Accession Number ID

WOS:000286432000005

Funding Agency	Grant Number
Hong Kong RGC	HKU 715108E

Funding Information:

We would like to thank the authors of [12] for sharing with us the implementation of the Partition algorithm. This work was supported by grant HKU 715108E from Hong Kong RGC.

References

References in Scopus

Grants

Privacy Preservation in the Publication of Data Sequences

DC Field	Value	Language
dc.contributor.author	Terrovitis, M	en_HK
dc.contributor.author	Mamoulis, N	en_HK
dc.contributor.author	Kalnis, P	en_HK
dc.date.accessioned	2011-08-26T14:39:03Z	-
dc.date.available	2011-08-26T14:39:03Z	-
dc.date.issued	2011	en_HK
dc.identifier.citation	Vldb Journal, 2011, v. 20 n. 1, p. 83-106	en_HK
dc.identifier.issn	1066-8888	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/138037	-
dc.description.abstract	In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag.	en_HK
dc.language	eng	en_US
dc.publisher	Springer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00778/index.htm	en_HK
dc.relation.ispartof	VLDB Journal	en_HK
dc.rights	The original publication is available at www.springerlink.com	-
dc.subject	Anonymity	en_HK
dc.subject	Database privacy	en_HK
dc.subject	Set-valued data	en_HK
dc.title	Local and global recoding methods for anonymizing set-valued data	en_HK
dc.type	Article	en_HK
dc.identifier.email	Mamoulis, N:nikos@cs.hku.hk	en_HK
dc.identifier.authority	Mamoulis, N=rp00155	en_HK
dc.description.nature	postprint	-
dc.identifier.doi	10.1007/s00778-010-0192-8	en_HK
dc.identifier.scopus	eid_2-s2.0-78751575731	en_HK
dc.identifier.hkuros	190925	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-78751575731&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	20	en_HK
dc.identifier.issue	1	en_HK
dc.identifier.spage	83	en_HK
dc.identifier.epage	106	en_HK
dc.identifier.isi	WOS:000286432000005	-
dc.publisher.place	Germany	en_HK
dc.relation.project	Privacy Preservation in the Publication of Data Sequences	-
dc.identifier.scopusauthorid	Terrovitis, M=36907107900	en_HK
dc.identifier.scopusauthorid	Mamoulis, N=6701782749	en_HK
dc.identifier.scopusauthorid	Kalnis, P=6603477534	en_HK
dc.identifier.citeulike	7324763	-
dc.identifier.issnl	1066-8888	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Local and global recoding methods for anonymizing set-valued data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats