File Download
Supplementary

postgraduate thesis: Nonparametric and semiparametric inference for survival data

TitleNonparametric and semiparametric inference for survival data
Authors
Advisors
Advisor(s):Yin, G
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Su, W. [蘇雯]. (2023). Nonparametric and semiparametric inference for survival data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe analysis of survival data plays a crucial role in driving advancements and innovations in modern medicine and clinical studies. Various types of survival data, such as current status data, right-censored data, and panel count data, each exhibits its own unique characteristics that pose significant challenges to the analysis of such data and existing methods often suffer grave consequences when the underlying model is misspecified. This thesis proposes several advanced methodologies using recently developed tools in deep learning as well as innovative functional forms to tackle overlooked yet important problems in survival analysis. Current status data and right-censored data are commonly encountered in biomedical studies, econometrics and social science. Due to censorship, exact failure time is only partially observed, making it difficult to estimate the hazard functions and distribution of survival times. To address the limitations of traditional approaches, the first part of this thesis proposes a model-free two-stage generative approach for estimating the conditional cumulative distribution function given predictors. The first stage learns a conditional generator nonparametrically for the joint conditional distribution of observation times and event status, and the second stage constructs the nonparametric maximum likelihood estimators of conditional distribution functions based on samples from the conditional generator. Subsequently, consistency of the proposed estimator has been established. Simulation studies under various settings show the superior performance of the deep conditional generative approach over the classical modeling approaches and an application to Parvovirus B19 seroprevalence data yields reasonable predictions. The second part proposes a novel deep learning approach to nonparametric statistical inference for the conditional hazard function of survival time with right-censored data. Deep neural network (DNN) is implemented to approximate the logarithm of a conditional hazard function given covariates and obtain a DNN likelihood-based estimator of the conditional hazard function. Such an estimation approach grants model flexibility and hence relaxes structural and functional assumptions on conditional hazard or survival functions. The consistency, convergence rate, and functional asymptotic normality of the proposed estimator have been established. Furthermore, we propose stage-of-art one-sample tests for goodness-of-fit evaluation and two-sample tests for treatment comparison. Both simulation studies and real application analysis show superior performances of the proposed estimators and tests in comparison with existing methods. Panel count data typically refer to data arising from studies with recurrent events, in which subjects are observed only at discrete time points rather than under continuous observations. The third and last part of this thesis investigate a general situation where a recurrent event process is eventually truncated by an informative terminal event and behavior of the recurrent event process near the terminal event is of special interest here. For nonparametric inference, a reversed mean model for estimating the mean function of the recurrent event process is introduced and implemented using a two-stage sieve likelihood-based method for estimating the mean function, which overcomes the computational difficulties arising from a nuisance functional parameter involved in the likelihood. The consistency and the convergence rate of the two-stage estimator are established. Allowing for the convergence rate slower than the standard rate, the general weak convergence theory of $M$-estimators with a nuisance functional parameter is developed and subsequently applied to the proposed estimator for deriving the asymptotic normality. Furthermore, a class of two-sample tests is developed. The proposed methods are evaluated with extensive simulation studies and illustrated with panel count data from the Chinese Longitudinal Healthy Longevity Study. The last part proposes a semiparametric reversed mean model, where a two-stage sieve likelihood-based method to estimate the baseline mean function and the covariate effects is developed. Such approach overcomes the computational difficulties arising from a nuisance functional parameter involved in the likelihood based on a Poisson process assumption. The consistency, convergence rate and asymptotic normality of the proposed two-stage estimator are established and is robust against the underlying Poisson process assumption. The proposed method is evaluated with extensive simulation studies and illustrated with panel count data from a longitudinal healthy longevity study and a bladder tumor study.
DegreeDoctor of Philosophy
SubjectSurvival analysis (Biometry)
Mathematical statistics
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/328610

 

DC FieldValueLanguage
dc.contributor.advisorYin, G-
dc.contributor.authorSu, Wen-
dc.contributor.author蘇雯-
dc.date.accessioned2023-06-29T05:44:41Z-
dc.date.available2023-06-29T05:44:41Z-
dc.date.issued2023-
dc.identifier.citationSu, W. [蘇雯]. (2023). Nonparametric and semiparametric inference for survival data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/328610-
dc.description.abstractThe analysis of survival data plays a crucial role in driving advancements and innovations in modern medicine and clinical studies. Various types of survival data, such as current status data, right-censored data, and panel count data, each exhibits its own unique characteristics that pose significant challenges to the analysis of such data and existing methods often suffer grave consequences when the underlying model is misspecified. This thesis proposes several advanced methodologies using recently developed tools in deep learning as well as innovative functional forms to tackle overlooked yet important problems in survival analysis. Current status data and right-censored data are commonly encountered in biomedical studies, econometrics and social science. Due to censorship, exact failure time is only partially observed, making it difficult to estimate the hazard functions and distribution of survival times. To address the limitations of traditional approaches, the first part of this thesis proposes a model-free two-stage generative approach for estimating the conditional cumulative distribution function given predictors. The first stage learns a conditional generator nonparametrically for the joint conditional distribution of observation times and event status, and the second stage constructs the nonparametric maximum likelihood estimators of conditional distribution functions based on samples from the conditional generator. Subsequently, consistency of the proposed estimator has been established. Simulation studies under various settings show the superior performance of the deep conditional generative approach over the classical modeling approaches and an application to Parvovirus B19 seroprevalence data yields reasonable predictions. The second part proposes a novel deep learning approach to nonparametric statistical inference for the conditional hazard function of survival time with right-censored data. Deep neural network (DNN) is implemented to approximate the logarithm of a conditional hazard function given covariates and obtain a DNN likelihood-based estimator of the conditional hazard function. Such an estimation approach grants model flexibility and hence relaxes structural and functional assumptions on conditional hazard or survival functions. The consistency, convergence rate, and functional asymptotic normality of the proposed estimator have been established. Furthermore, we propose stage-of-art one-sample tests for goodness-of-fit evaluation and two-sample tests for treatment comparison. Both simulation studies and real application analysis show superior performances of the proposed estimators and tests in comparison with existing methods. Panel count data typically refer to data arising from studies with recurrent events, in which subjects are observed only at discrete time points rather than under continuous observations. The third and last part of this thesis investigate a general situation where a recurrent event process is eventually truncated by an informative terminal event and behavior of the recurrent event process near the terminal event is of special interest here. For nonparametric inference, a reversed mean model for estimating the mean function of the recurrent event process is introduced and implemented using a two-stage sieve likelihood-based method for estimating the mean function, which overcomes the computational difficulties arising from a nuisance functional parameter involved in the likelihood. The consistency and the convergence rate of the two-stage estimator are established. Allowing for the convergence rate slower than the standard rate, the general weak convergence theory of $M$-estimators with a nuisance functional parameter is developed and subsequently applied to the proposed estimator for deriving the asymptotic normality. Furthermore, a class of two-sample tests is developed. The proposed methods are evaluated with extensive simulation studies and illustrated with panel count data from the Chinese Longitudinal Healthy Longevity Study. The last part proposes a semiparametric reversed mean model, where a two-stage sieve likelihood-based method to estimate the baseline mean function and the covariate effects is developed. Such approach overcomes the computational difficulties arising from a nuisance functional parameter involved in the likelihood based on a Poisson process assumption. The consistency, convergence rate and asymptotic normality of the proposed two-stage estimator are established and is robust against the underlying Poisson process assumption. The proposed method is evaluated with extensive simulation studies and illustrated with panel count data from a longitudinal healthy longevity study and a bladder tumor study.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshSurvival analysis (Biometry)-
dc.subject.lcshMathematical statistics-
dc.titleNonparametric and semiparametric inference for survival data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044695781703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats