On Pooling of Data and Its Relative Efficiency

Xu, J; Kuk, A

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1111/insr.12070
Scopus: eid_2-s2.0-84938064675
WOS: WOS:000358789300013
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Statistics & Actuarial Science: Journal/Magazine Articles

Article: On Pooling of Data and Its Relative Efficiency

Title	On Pooling of Data and Its Relative Efficiency
Authors	Xu, J Kuk, A
Keywords	Asymptotic Relative Efficiency Case–Control Study Gaussian Estimation Haplotype Frequency Estimation Interaction Between Genes Lattice Theory Non‐Parametric Maximum Likelihood
Issue Date	2015
Publisher	International Statistical Institute. The Journal's web site is located at http://www.cbs.nl/isi/isr.htm
Citation	International Statistical Review, 2015, v. 83, p. 309-323 How to Cite? DOI: http://dx.doi.org/10.1111/insr.12070
Abstract	Pooling of data is often carried out to protect privacy or to save cost, with the claimed advantage that it does not lead to much loss of efficiency. We argue that this does not give the complete picture as the estimation of different parameters is affected to different degrees by pooling. We establish a ladder of efficiency loss for estimating the mean, variance, skewness and kurtosis, and more generally multivariate joint cumulants, in powers of the pool size. The asymptotic efficiency of the pooled data non‐parametric/parametric maximum likelihood estimator relative to the corresponding unpooled data estimator is reduced by a factor equal to the pool size whenever the order of the cumulant to be estimated is increased by one. The implications of this result are demonstrated in case–control genetic association studies with interactions between genes. Our findings provide a guideline for the discriminate use of data pooling in practice and the assessment of its relative efficiency. As exact maximum likelihood estimates are difficult to obtain if the pool size is large, we address briefly how to obtain computationally efficient estimates from pooled data and suggest Gaussian estimation and non‐parametric maximum likelihood as two feasible methods.
Persistent Identifier	http://hdl.handle.net/10722/221683
ISSN	0306-7734 2023 Impact Factor: 1.7 2023 SCImago Journal Rankings: 1.048
ISI Accession Number ID	WOS:000358789300013

DC Field	Value	Language
dc.contributor.author	Xu, J	-
dc.contributor.author	Kuk, A	-
dc.date.accessioned	2015-12-04T15:29:05Z	-
dc.date.available	2015-12-04T15:29:05Z	-
dc.date.issued	2015	-
dc.identifier.citation	International Statistical Review, 2015, v. 83, p. 309-323	-
dc.identifier.issn	0306-7734	-
dc.identifier.uri	http://hdl.handle.net/10722/221683	-
dc.description.abstract	Pooling of data is often carried out to protect privacy or to save cost, with the claimed advantage that it does not lead to much loss of efficiency. We argue that this does not give the complete picture as the estimation of different parameters is affected to different degrees by pooling. We establish a ladder of efficiency loss for estimating the mean, variance, skewness and kurtosis, and more generally multivariate joint cumulants, in powers of the pool size. The asymptotic efficiency of the pooled data non‐parametric/parametric maximum likelihood estimator relative to the corresponding unpooled data estimator is reduced by a factor equal to the pool size whenever the order of the cumulant to be estimated is increased by one. The implications of this result are demonstrated in case–control genetic association studies with interactions between genes. Our findings provide a guideline for the discriminate use of data pooling in practice and the assessment of its relative efficiency. As exact maximum likelihood estimates are difficult to obtain if the pool size is large, we address briefly how to obtain computationally efficient estimates from pooled data and suggest Gaussian estimation and non‐parametric maximum likelihood as two feasible methods.	-
dc.language	eng	-
dc.publisher	International Statistical Institute. The Journal's web site is located at http://www.cbs.nl/isi/isr.htm	-
dc.relation.ispartof	International Statistical Review	-
dc.subject	Asymptotic Relative Efficiency	-
dc.subject	Case–Control Study	-
dc.subject	Gaussian Estimation	-
dc.subject	Haplotype Frequency Estimation	-
dc.subject	Interaction Between Genes	-
dc.subject	Lattice Theory	-
dc.subject	Non‐Parametric Maximum Likelihood	-
dc.title	On Pooling of Data and Its Relative Efficiency	-
dc.type	Article	-
dc.identifier.email	Xu, J: xujf@hku.hk	-
dc.identifier.authority	Xu, J=rp02086	-
dc.identifier.doi	10.1111/insr.12070	-
dc.identifier.scopus	eid_2-s2.0-84938064675	-
dc.identifier.volume	83	-
dc.identifier.spage	309	-
dc.identifier.epage	323	-
dc.identifier.isi	WOS:000358789300013	-
dc.identifier.issnl	0306-7734	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: On Pooling of Data and Its Relative Efficiency

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats