File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Article: One Step to Efficient Synthetic Data
Title | One Step to Efficient Synthetic Data |
---|---|
Authors | |
Issue Date | 6-Sep-2023 |
Publisher | Institute of Statistical Science |
Citation | Statistica Sinica, 2024, v. Forthcoming How to Cite? |
Abstract | A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions. |
Persistent Identifier | http://hdl.handle.net/10722/333967 |
ISSN | 2023 Impact Factor: 1.5 2023 SCImago Journal Rankings: 1.368 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Awan, Jordan | - |
dc.contributor.author | Cai, Zhanrui | - |
dc.date.accessioned | 2023-10-10T03:15:01Z | - |
dc.date.available | 2023-10-10T03:15:01Z | - |
dc.date.issued | 2023-09-06 | - |
dc.identifier.citation | Statistica Sinica, 2024, v. Forthcoming | - |
dc.identifier.issn | 1017-0405 | - |
dc.identifier.uri | http://hdl.handle.net/10722/333967 | - |
dc.description.abstract | <p>A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators, and the joint distribution of the sample is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data that is widely applicable for parametric models, has asymptotically efficient summary statistics, and is easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data, which satisfy differential privacy. In the case of continuous random variables, we prove that our method preserves the efficient estimator with asymptotically negligible error and show through simulations that this property holds for discrete distributions as well. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform hypothesis tests in the presence of intractable likelihood functions.<br></p> | - |
dc.language | eng | - |
dc.publisher | Institute of Statistical Science | - |
dc.relation.ispartof | Statistica Sinica | - |
dc.title | One Step to Efficient Synthetic Data | - |
dc.type | Article | - |
dc.identifier.doi | 10.5705/ss.202022.0274 | - |
dc.identifier.volume | Forthcoming | - |
dc.identifier.issnl | 1017-0405 | - |