UPA: An Automated, Accurate and Efﬁcient Differentially Private Big-data Mining System

Li, TO; Jiang, J; Qi, J; So, CC; Ma, JC; Chen, X; Shen, T; Cui, H; Wang, Y; Wang, P

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/DSN48063.2020.00064
Scopus: eid_2-s2.0-85090405505
WOS: WOS:000617924900044
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: UPA: An Automated, Accurate and Efﬁcient Differentially Private Big-data Mining System

Title	UPA: An Automated, Accurate and Efﬁcient Differentially Private Big-data Mining System
Authors	Li, TO Jiang, J Qi, J So, CC Ma, JC Chen, X Shen, T Cui, H Wang, Y Wang, P
Keywords	Sensitivity Flexible printed circuits Sparks Static analysis
Issue Date	2020
Publisher	IEEE. The Journal's web site is located at https://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000192
Citation	Proceedings of 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Valencia, Spain, 29 June-2 July 2020, p. 515-527 How to Cite? DOI: http://dx.doi.org/10.1109/DSN48063.2020.00064
Abstract	In the era of big-data, individuals and institutions store their sensitive data on clouds, and these data are often analyzed and computed by MapReduce frameworks (e.g., Spark). However, releasing the computation result on these data may leak privacy. Differential Privacy (DP) is a powerful method to preserve the privacy of an individual data record from a computation result. Given an input dataset and a query, DP typically perturbs an output value with noise proportional to sensitivity, the greatest change on an output value when a record is added to or removed from the input dataset. Unfortunately, directly computing the sensitivity value for a query and an input dataset is computationally infeasible, because it requires adding or removing every record from the dataset and repeatedly running the same query on the dataset: a dataset of one million input records requires running the same query for more than one million times. This paper presents UPA, the first automated, accurate, and efficient sensitivity inferring approach for big-data mining applications. Our key observation is that MapReduce operators often have commutative and associative properties in order to enable parallelism and fault tolerance among computers. Therefore, UPA can greatly reduce the repeated computations at runtime while computing a precise sensitivity value automatically for general big-data queries. We compared UPA with FLEX, the most relevant work that does static analysis on queries to infer sensitivity values. Based on an extensive evaluation on nine diverse Spark queries, UPA supports all the nine evaluated queries, while FLEX supports only five of the nine queries. For the five queries which both UPA and FLEX can support, UPA enforces DP with five orders of magnitude more accurate sensitivity values than FLEX. UPA has reasonable performance overhead compared to native Spark. UPA's source code is available on https://github.com/hku-systems/UPA.
Description	Session 11 - Trusted Cloud Computing
Persistent Identifier	http://hdl.handle.net/10722/286407
ISSN	1530-0889
ISI Accession Number ID	WOS:000617924900044

DC Field	Value	Language
dc.contributor.author	Li, TO	-
dc.contributor.author	Jiang, J	-
dc.contributor.author	Qi, J	-
dc.contributor.author	So, CC	-
dc.contributor.author	Ma, JC	-
dc.contributor.author	Chen, X	-
dc.contributor.author	Shen, T	-
dc.contributor.author	Cui, H	-
dc.contributor.author	Wang, Y	-
dc.contributor.author	Wang, P	-
dc.date.accessioned	2020-08-31T07:03:27Z	-
dc.date.available	2020-08-31T07:03:27Z	-
dc.date.issued	2020	-
dc.identifier.citation	Proceedings of 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Valencia, Spain, 29 June-2 July 2020, p. 515-527	-
dc.identifier.issn	1530-0889	-
dc.identifier.uri	http://hdl.handle.net/10722/286407	-
dc.description	Session 11 - Trusted Cloud Computing	-
dc.description.abstract	In the era of big-data, individuals and institutions store their sensitive data on clouds, and these data are often analyzed and computed by MapReduce frameworks (e.g., Spark). However, releasing the computation result on these data may leak privacy. Differential Privacy (DP) is a powerful method to preserve the privacy of an individual data record from a computation result. Given an input dataset and a query, DP typically perturbs an output value with noise proportional to sensitivity, the greatest change on an output value when a record is added to or removed from the input dataset. Unfortunately, directly computing the sensitivity value for a query and an input dataset is computationally infeasible, because it requires adding or removing every record from the dataset and repeatedly running the same query on the dataset: a dataset of one million input records requires running the same query for more than one million times. This paper presents UPA, the first automated, accurate, and efficient sensitivity inferring approach for big-data mining applications. Our key observation is that MapReduce operators often have commutative and associative properties in order to enable parallelism and fault tolerance among computers. Therefore, UPA can greatly reduce the repeated computations at runtime while computing a precise sensitivity value automatically for general big-data queries. We compared UPA with FLEX, the most relevant work that does static analysis on queries to infer sensitivity values. Based on an extensive evaluation on nine diverse Spark queries, UPA supports all the nine evaluated queries, while FLEX supports only five of the nine queries. For the five queries which both UPA and FLEX can support, UPA enforces DP with five orders of magnitude more accurate sensitivity values than FLEX. UPA has reasonable performance overhead compared to native Spark. UPA's source code is available on https://github.com/hku-systems/UPA.	-
dc.language	eng	-
dc.publisher	IEEE. The Journal's web site is located at https://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000192	-
dc.relation.ispartof	International Conference on Dependable Systems and Networks (DSN) Proceedings	-
dc.rights	International Conference on Dependable Systems and Networks (DSN) Proceedings. Copyright © IEEE.	-
dc.rights	©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.subject	Sensitivity	-
dc.subject	Flexible printed circuits	-
dc.subject	Sparks	-
dc.subject	Static analysis	-
dc.title	UPA: An Automated, Accurate and Efﬁcient Differentially Private Big-data Mining System	-
dc.type	Conference_Paper	-
dc.identifier.email	Cui, H: heming@cs.hku.hk	-
dc.identifier.email	Wang, Y: amywang@hku.hk	-
dc.identifier.authority	Cui, H=rp02008	-
dc.identifier.doi	10.1109/DSN48063.2020.00064	-
dc.identifier.scopus	eid_2-s2.0-85090405505	-
dc.identifier.hkuros	313508	-
dc.identifier.spage	515	-
dc.identifier.epage	527	-
dc.identifier.isi	WOS:000617924900044	-
dc.publisher.place	United States	-
dc.identifier.issnl	1530-0889	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: UPA: An Automated, Accurate and Efﬁcient Differentially Private Big-data Mining System

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats