On U statistics for large scale or high dimensional data

Zhao, Na

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Statistics & Actuarial Science: Theses

postgraduate thesis: On U statistics for large scale or high dimensional data

Title	On U statistics for large scale or high dimensional data
Authors	Zhao, Na
Issue Date	2021
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhao, N.. (2021). On U statistics for large scale or high dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	U statistics is an important structure in the study of both parametric and nonparametric statistics. The generalization property and the simple format of U statistics make it quite popular in the application of many common models. However, estimation equation with U statistics structure will be computational intensive when the sample size n or dimension of parameter p is quite large. To address this problem, we explore some new methods for applications of U statistics under these settings. In the first part, this thesis considers the application of U-type estimation equation in empirical likelihood with a diverging number of parameter. A penalized jackknife empirical likelihood method is proposed to attain a consistent estimator and preserves the computational efficiency at the same time. Selection consistency and Wilk’ theorem are established for the pro- posed estimator in high dimensional setting. Simulation studies and a real study case demonstrate the performance of the proposed approach in both the selection and estimation aspects. Secondly, this thesis introduces a rank-based batch stochastic gradient accelerated failure time method for the survival data of large scale. The proposed method transfers the original optimization regarding a U statistics of computational complexity order O(n^2) into an iterative update problem, which processes a small batch of data at each time. Scalable inference to the proposed estimator can be derived in a parallel manner, which numerously reduces the computation burden. Relative efficiency is discussed to show the efficiency of the proposed methodology. Simulation studies and a breast cancer study are conducted to show the advantages of the proposed approach over some existing methods. Last, this thesis considers a general application of two-sample U statistics with 2 degree for the generalized transformation regression model. Traditional solution to this model is the Maximum Rank Correlation estimator. The difficulty lies in the optimization procedure, where the objective U-type function is neither smooth nor continuous, making the computation quite complicated when the sample size is of large scale. A self-induced batch stochastic gradient descent method is suggested, which is extended to the censored data then. The proposed method avoids the dilemma to choose smoothing parameter, meanwhile retains the consistency and asymptotic normality under reasonable amount of smoothing. Simulation experiments involving various models and a real application demonstrate the performance of proposed methodology.
Degree	Doctor of Philosophy
Subject	Estimation theory
Dept/Program	Statistics and Actuarial Science
Persistent Identifier	http://hdl.handle.net/10722/325720

DC Field	Value	Language
dc.contributor.author	Zhao, Na	-
dc.date.accessioned	2023-03-02T16:32:17Z	-
dc.date.available	2023-03-02T16:32:17Z	-
dc.date.issued	2021	-
dc.identifier.citation	Zhao, N.. (2021). On U statistics for large scale or high dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/325720	-
dc.description.abstract	U statistics is an important structure in the study of both parametric and nonparametric statistics. The generalization property and the simple format of U statistics make it quite popular in the application of many common models. However, estimation equation with U statistics structure will be computational intensive when the sample size n or dimension of parameter p is quite large. To address this problem, we explore some new methods for applications of U statistics under these settings. In the first part, this thesis considers the application of U-type estimation equation in empirical likelihood with a diverging number of parameter. A penalized jackknife empirical likelihood method is proposed to attain a consistent estimator and preserves the computational efficiency at the same time. Selection consistency and Wilk’ theorem are established for the pro- posed estimator in high dimensional setting. Simulation studies and a real study case demonstrate the performance of the proposed approach in both the selection and estimation aspects. Secondly, this thesis introduces a rank-based batch stochastic gradient accelerated failure time method for the survival data of large scale. The proposed method transfers the original optimization regarding a U statistics of computational complexity order O(n^2) into an iterative update problem, which processes a small batch of data at each time. Scalable inference to the proposed estimator can be derived in a parallel manner, which numerously reduces the computation burden. Relative efficiency is discussed to show the efficiency of the proposed methodology. Simulation studies and a breast cancer study are conducted to show the advantages of the proposed approach over some existing methods. Last, this thesis considers a general application of two-sample U statistics with 2 degree for the generalized transformation regression model. Traditional solution to this model is the Maximum Rank Correlation estimator. The difficulty lies in the optimization procedure, where the objective U-type function is neither smooth nor continuous, making the computation quite complicated when the sample size is of large scale. A self-induced batch stochastic gradient descent method is suggested, which is extended to the censored data then. The proposed method avoids the dilemma to choose smoothing parameter, meanwhile retains the consistency and asymptotic normality under reasonable amount of smoothing. Simulation experiments involving various models and a real application demonstrate the performance of proposed methodology.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Estimation theory	-
dc.title	On U statistics for large scale or high dimensional data	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Statistics and Actuarial Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2022	-
dc.identifier.mmsid	991044649904303414	-

File Download

Supplementary

postgraduate thesis: On U statistics for large scale or high dimensional data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats