File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: One Step to Efficient Synthetic Data

TitleOne Step to Efficient Synthetic Data
Authors
Issue Date6-Sep-2023
PublisherInstitute of Statistical Science
Citation
Statistica Sinica, 2024, v. Forthcoming How to Cite?
Abstract

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions.


Persistent Identifierhttp://hdl.handle.net/10722/333967
ISSN
2023 Impact Factor: 1.5
2023 SCImago Journal Rankings: 1.368

 

DC FieldValueLanguage
dc.contributor.authorAwan, Jordan-
dc.contributor.authorCai, Zhanrui-
dc.date.accessioned2023-10-10T03:15:01Z-
dc.date.available2023-10-10T03:15:01Z-
dc.date.issued2023-09-06-
dc.identifier.citationStatistica Sinica, 2024, v. Forthcoming-
dc.identifier.issn1017-0405-
dc.identifier.urihttp://hdl.handle.net/10722/333967-
dc.description.abstract<p>A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions.<br></p>-
dc.languageeng-
dc.publisherInstitute of Statistical Science-
dc.relation.ispartofStatistica Sinica-
dc.titleOne Step to Efficient Synthetic Data-
dc.typeArticle-
dc.identifier.doi10.5705/ss.202022.0274-
dc.identifier.volumeForthcoming-
dc.identifier.issnl1017-0405-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats