File Download
Supplementary

postgraduate thesis: On U statistics for large scale or high dimensional data

TitleOn U statistics for large scale or high dimensional data
Authors
Issue Date2021
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhao, N.. (2021). On U statistics for large scale or high dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractU statistics is an important structure in the study of both parametric and nonparametric statistics. The generalization property and the simple format of U statistics make it quite popular in the application of many common models. However, estimation equation with U statistics structure will be computational intensive when the sample size n or dimension of parameter p is quite large. To address this problem, we explore some new methods for applications of U statistics under these settings. In the first part, this thesis considers the application of U-type estimation equation in empirical likelihood with a diverging number of parameter. A penalized jackknife empirical likelihood method is proposed to attain a consistent estimator and preserves the computational efficiency at the same time. Selection consistency and Wilk’ theorem are established for the pro- posed estimator in high dimensional setting. Simulation studies and a real study case demonstrate the performance of the proposed approach in both the selection and estimation aspects. Secondly, this thesis introduces a rank-based batch stochastic gradient accelerated failure time method for the survival data of large scale. The proposed method transfers the original optimization regarding a U statistics of computational complexity order O(n^2) into an iterative update problem, which processes a small batch of data at each time. Scalable inference to the proposed estimator can be derived in a parallel manner, which numerously reduces the computation burden. Relative efficiency is discussed to show the efficiency of the proposed methodology. Simulation studies and a breast cancer study are conducted to show the advantages of the proposed approach over some existing methods. Last, this thesis considers a general application of two-sample U statistics with 2 degree for the generalized transformation regression model. Traditional solution to this model is the Maximum Rank Correlation estimator. The difficulty lies in the optimization procedure, where the objective U-type function is neither smooth nor continuous, making the computation quite complicated when the sample size is of large scale. A self-induced batch stochastic gradient descent method is suggested, which is extended to the censored data then. The proposed method avoids the dilemma to choose smoothing parameter, meanwhile retains the consistency and asymptotic normality under reasonable amount of smoothing. Simulation experiments involving various models and a real application demonstrate the performance of proposed methodology.
DegreeDoctor of Philosophy
SubjectEstimation theory
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/325720

 

DC FieldValueLanguage
dc.contributor.authorZhao, Na-
dc.date.accessioned2023-03-02T16:32:17Z-
dc.date.available2023-03-02T16:32:17Z-
dc.date.issued2021-
dc.identifier.citationZhao, N.. (2021). On U statistics for large scale or high dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/325720-
dc.description.abstractU statistics is an important structure in the study of both parametric and nonparametric statistics. The generalization property and the simple format of U statistics make it quite popular in the application of many common models. However, estimation equation with U statistics structure will be computational intensive when the sample size n or dimension of parameter p is quite large. To address this problem, we explore some new methods for applications of U statistics under these settings. In the first part, this thesis considers the application of U-type estimation equation in empirical likelihood with a diverging number of parameter. A penalized jackknife empirical likelihood method is proposed to attain a consistent estimator and preserves the computational efficiency at the same time. Selection consistency and Wilk’ theorem are established for the pro- posed estimator in high dimensional setting. Simulation studies and a real study case demonstrate the performance of the proposed approach in both the selection and estimation aspects. Secondly, this thesis introduces a rank-based batch stochastic gradient accelerated failure time method for the survival data of large scale. The proposed method transfers the original optimization regarding a U statistics of computational complexity order O(n^2) into an iterative update problem, which processes a small batch of data at each time. Scalable inference to the proposed estimator can be derived in a parallel manner, which numerously reduces the computation burden. Relative efficiency is discussed to show the efficiency of the proposed methodology. Simulation studies and a breast cancer study are conducted to show the advantages of the proposed approach over some existing methods. Last, this thesis considers a general application of two-sample U statistics with 2 degree for the generalized transformation regression model. Traditional solution to this model is the Maximum Rank Correlation estimator. The difficulty lies in the optimization procedure, where the objective U-type function is neither smooth nor continuous, making the computation quite complicated when the sample size is of large scale. A self-induced batch stochastic gradient descent method is suggested, which is extended to the censored data then. The proposed method avoids the dilemma to choose smoothing parameter, meanwhile retains the consistency and asymptotic normality under reasonable amount of smoothing. Simulation experiments involving various models and a real application demonstrate the performance of proposed methodology.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshEstimation theory-
dc.titleOn U statistics for large scale or high dimensional data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044649904303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats