A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants

Kuk, AYC; Li, X; Xu, J

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1002/sim.5540
Scopus: eid_2-s2.0-84875279361
PMID: 22855289
WOS: WOS:000316625600009
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Statistics & Actuarial Science: Journal/Magazine Articles

Article: A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants

Title	A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants
Authors	Kuk, AYC Li, X Xu, J
Keywords	Collapsed Data Em Algorithm Genetic Association Haplotype Frequency Estimation Rare Variants Union Probability
Issue Date	2013
Publisher	John Wiley & Sons Ltd. The Journal's web site is located at http://www.interscience.wiley.com/jpages/0277-6715/
Citation	Statistics in Medicine, 2013, v. 32, p. 1343-1360 How to Cite? DOI: http://dx.doi.org/10.1002/sim.5540
Abstract	Haplotype information could lead to more powerful tests of genetic association than single‐locus analyses but it is not easy to estimate haplotype frequencies from genotype data due to phase ambiguity. The challenge is compounded when individuals are pooled together to save costs or to increase sample size, which is crucial in the study of rare variants. Existing expectation–maximization type algorithms are slow and cannot cope with large pool size or long haplotypes. We show that by collapsing the total allele frequencies of each pool suitably, the maximum likelihood estimates of haplotype frequencies based on the collapsed data can be calculated very quickly regardless of pool size and haplotype length. We provide a running time analysis to demonstrate the considerable savings in time that the collapsed data method can bring. The method is particularly well suited to estimating certain union probabilities useful in the study of rare variants. We provide theoretical and empirical evidence to suggest that the proposed estimation method will not suffer much loss in efficiency if the variants are rare. We use the method to analyze re‐sequencing data collected from a case control study involving 148 obese persons and 150 controls. Focusing on a region containing 25 rare variants around the gene, our method selects three rare variants as potentially causal. This is more parsimonious than the 12 variants selected by a recently proposed covering method. From another set of 32 rare variants around the gene, we discover an interesting potential interaction between two of them. Copyright © 2012 John Wiley & Sons, Ltd.
Persistent Identifier	http://hdl.handle.net/10722/221674
ISSN	0277-6715 2023 Impact Factor: 1.8 2023 SCImago Journal Rankings: 1.348
ISI Accession Number ID	WOS:000316625600009

DC Field	Value	Language
dc.contributor.author	Kuk, AYC	-
dc.contributor.author	Li, X	-
dc.contributor.author	Xu, J	-
dc.date.accessioned	2015-12-04T15:29:00Z	-
dc.date.available	2015-12-04T15:29:00Z	-
dc.date.issued	2013	-
dc.identifier.citation	Statistics in Medicine, 2013, v. 32, p. 1343-1360	-
dc.identifier.issn	0277-6715	-
dc.identifier.uri	http://hdl.handle.net/10722/221674	-
dc.description.abstract	Haplotype information could lead to more powerful tests of genetic association than single‐locus analyses but it is not easy to estimate haplotype frequencies from genotype data due to phase ambiguity. The challenge is compounded when individuals are pooled together to save costs or to increase sample size, which is crucial in the study of rare variants. Existing expectation–maximization type algorithms are slow and cannot cope with large pool size or long haplotypes. We show that by collapsing the total allele frequencies of each pool suitably, the maximum likelihood estimates of haplotype frequencies based on the collapsed data can be calculated very quickly regardless of pool size and haplotype length. We provide a running time analysis to demonstrate the considerable savings in time that the collapsed data method can bring. The method is particularly well suited to estimating certain union probabilities useful in the study of rare variants. We provide theoretical and empirical evidence to suggest that the proposed estimation method will not suffer much loss in efficiency if the variants are rare. We use the method to analyze re‐sequencing data collected from a case control study involving 148 obese persons and 150 controls. Focusing on a region containing 25 rare variants around the gene, our method selects three rare variants as potentially causal. This is more parsimonious than the 12 variants selected by a recently proposed covering method. From another set of 32 rare variants around the gene, we discover an interesting potential interaction between two of them. Copyright © 2012 John Wiley & Sons, Ltd.	-
dc.language	eng	-
dc.publisher	John Wiley & Sons Ltd. The Journal's web site is located at http://www.interscience.wiley.com/jpages/0277-6715/	-
dc.relation.ispartof	Statistics in Medicine	-
dc.subject	Collapsed Data	-
dc.subject	Em Algorithm	-
dc.subject	Genetic Association	-
dc.subject	Haplotype Frequency Estimation	-
dc.subject	Rare Variants	-
dc.subject	Union Probability	-
dc.title	A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants	-
dc.type	Article	-
dc.identifier.email	Xu, J: xujf@hku.hk	-
dc.identifier.authority	Xu, J=rp02086	-
dc.identifier.doi	10.1002/sim.5540	-
dc.identifier.pmid	22855289	-
dc.identifier.scopus	eid_2-s2.0-84875279361	-
dc.identifier.volume	32	-
dc.identifier.spage	1343	-
dc.identifier.epage	1360	-
dc.identifier.isi	WOS:000316625600009	-
dc.identifier.issnl	0277-6715	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats