File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: On Pooling of Data and Its Relative Efficiency

TitleOn Pooling of Data and Its Relative Efficiency
Authors
KeywordsAsymptotic Relative Efficiency
Case–Control Study
Gaussian Estimation
Haplotype Frequency Estimation
Interaction Between Genes
Lattice Theory
Non‐Parametric Maximum Likelihood
Issue Date2015
PublisherInternational Statistical Institute. The Journal's web site is located at http://www.cbs.nl/isi/isr.htm
Citation
International Statistical Review, 2015, v. 83, p. 309-323 How to Cite?
AbstractPooling of data is often carried out to protect privacy or to save cost, with the claimed advantage that it does not lead to much loss of efficiency. We argue that this does not give the complete picture as the estimation of different parameters is affected to different degrees by pooling. We establish a ladder of efficiency loss for estimating the mean, variance, skewness and kurtosis, and more generally multivariate joint cumulants, in powers of the pool size. The asymptotic efficiency of the pooled data non‐parametric/parametric maximum likelihood estimator relative to the corresponding unpooled data estimator is reduced by a factor equal to the pool size whenever the order of the cumulant to be estimated is increased by one. The implications of this result are demonstrated in case–control genetic association studies with interactions between genes. Our findings provide a guideline for the discriminate use of data pooling in practice and the assessment of its relative efficiency. As exact maximum likelihood estimates are difficult to obtain if the pool size is large, we address briefly how to obtain computationally efficient estimates from pooled data and suggest Gaussian estimation and non‐parametric maximum likelihood as two feasible methods.
Persistent Identifierhttp://hdl.handle.net/10722/221683
ISSN
2015 Impact Factor: 1.789
2015 SCImago Journal Rankings: 1.216
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorXu, J-
dc.contributor.authorKuk, A-
dc.date.accessioned2015-12-04T15:29:05Z-
dc.date.available2015-12-04T15:29:05Z-
dc.date.issued2015-
dc.identifier.citationInternational Statistical Review, 2015, v. 83, p. 309-323-
dc.identifier.issn0306-7734-
dc.identifier.urihttp://hdl.handle.net/10722/221683-
dc.description.abstractPooling of data is often carried out to protect privacy or to save cost, with the claimed advantage that it does not lead to much loss of efficiency. We argue that this does not give the complete picture as the estimation of different parameters is affected to different degrees by pooling. We establish a ladder of efficiency loss for estimating the mean, variance, skewness and kurtosis, and more generally multivariate joint cumulants, in powers of the pool size. The asymptotic efficiency of the pooled data non‐parametric/parametric maximum likelihood estimator relative to the corresponding unpooled data estimator is reduced by a factor equal to the pool size whenever the order of the cumulant to be estimated is increased by one. The implications of this result are demonstrated in case–control genetic association studies with interactions between genes. Our findings provide a guideline for the discriminate use of data pooling in practice and the assessment of its relative efficiency. As exact maximum likelihood estimates are difficult to obtain if the pool size is large, we address briefly how to obtain computationally efficient estimates from pooled data and suggest Gaussian estimation and non‐parametric maximum likelihood as two feasible methods.-
dc.languageeng-
dc.publisherInternational Statistical Institute. The Journal's web site is located at http://www.cbs.nl/isi/isr.htm-
dc.relation.ispartofInternational Statistical Review-
dc.subjectAsymptotic Relative Efficiency-
dc.subjectCase–Control Study-
dc.subjectGaussian Estimation-
dc.subjectHaplotype Frequency Estimation-
dc.subjectInteraction Between Genes-
dc.subjectLattice Theory-
dc.subjectNon‐Parametric Maximum Likelihood-
dc.titleOn Pooling of Data and Its Relative Efficiency-
dc.typeArticle-
dc.identifier.emailXu, J: xujf@hku.hk-
dc.identifier.authorityXu, J=rp02086-
dc.identifier.doi10.1111/insr.12070-
dc.identifier.scopuseid_2-s2.0-84938064675-
dc.identifier.volume83-
dc.identifier.spage309-
dc.identifier.epage323-
dc.identifier.isiWOS:000358789300013-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats