One Step to Efficient Synthetic Data

Awan, Jordan; Cai, Zhanrui

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.5705/ss.202022.0274
Find via

Supplementary

Citations:
Appears in Collections:
- Faculty of Business & Economics: Journal/Magazine Articles

Article: One Step to Efficient Synthetic Data

Title	One Step to Efficient Synthetic Data
Authors	Awan, Jordan Cai, Zhanrui
Issue Date	6-Sep-2023
Publisher	Institute of Statistical Science
Citation	Statistica Sinica, 2024, v. Forthcoming How to Cite? DOI: http://dx.doi.org/10.5705/ss.202022.0274
Abstract	A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions.
Persistent Identifier	http://hdl.handle.net/10722/333967
ISSN	1017-0405 2023 Impact Factor: 1.5 2023 SCImago Journal Rankings: 1.368

DC Field	Value	Language
dc.contributor.author	Awan, Jordan	-
dc.contributor.author	Cai, Zhanrui	-
dc.date.accessioned	2023-10-10T03:15:01Z	-
dc.date.available	2023-10-10T03:15:01Z	-
dc.date.issued	2023-09-06	-
dc.identifier.citation	Statistica Sinica, 2024, v. Forthcoming	-
dc.identifier.issn	1017-0405	-
dc.identifier.uri	http://hdl.handle.net/10722/333967	-
dc.description.abstract	<p>A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions.<br></p>	-
dc.language	eng	-
dc.publisher	Institute of Statistical Science	-
dc.relation.ispartof	Statistica Sinica	-
dc.title	One Step to Efficient Synthetic Data	-
dc.type	Article	-
dc.identifier.doi	10.5705/ss.202022.0274	-
dc.identifier.volume	Forthcoming	-
dc.identifier.issnl	1017-0405	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: One Step to Efficient Synthetic Data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats