Article: Semiparametric analysis of zeroinflated count data
Title  Semiparametric analysis of zeroinflated count data 

Authors  
Keywords  Asymptotically efficient Generalized partly linear model Sieve maximum likelihood estimator Zeroinflated Poisson regression model 
Issue Date  2006 
Publisher  Blackwell Publishing Ltd. The Journal's web site is located at http://www.blackwellpublishing.com/journals/BIOM 
Citation  Biometrics, 2006, v. 62 n. 4, p. 9961003+1283 How to Cite? 
Abstract  Medical and public health research often involve the analysis of count data that exhibit a substantially large proportion of zeros, such as the number of heart attacks and the number of days of missed primary activities in a given period. A zeroinflated Poisson regression model, which hypothesizes a twopoint heterogeneity in the population characterized by a binary random effect, is generally used to model such data. Subjects are broadly categorized into the lowrisk group leading to structural zero counts and highrisk (or normal) group so that the counts can be modeled by a Poisson regression model. The main aim is to identify the explanatory variables that have significant effects on (i) the probability that the subject is from the lowrisk group by means of a logistic regression formulation; and (ii) the magnitude of the counts, given that the subject is from the highrisk group by means of a Poisson regression where the effects of the covariates are assumed to be linearly related to the natural logarithm of the mean of the counts. In this article we consider a semiparametric zeroinflated Poisson regression model that postulates a possibly nonlinear relationship between the natural logarithm of the mean of the counts and a particular covariate. A sieve maximum likelihood estimation method is proposed. Asymptotic properties of the proposed sieve maximum likelihood estimators are discussed. Under some mild conditions, the estimators are shown to be asymptotically efficient and normally distributed. Simulation studies were carried out to investigate the performance of the proposed method. For illustration purpose, the method is applied to a data set from a public health survey conducted in Indonesia where the variable of interest is the number of days of missed primary activities due to illness in a 4week period. © 2006, The International Biometric Society. 
Persistent Identifier  http://hdl.handle.net/10722/82835 
ISSN  2015 Impact Factor: 1.36 2015 SCImago Journal Rankings: 1.906 
ISI Accession Number ID  
