File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: On U statistics for large scale or high dimensional data
Title | On U statistics for large scale or high dimensional data |
---|---|
Authors | |
Issue Date | 2021 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhao, N.. (2021). On U statistics for large scale or high dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | U statistics is an important structure in the study of both parametric and nonparametric statistics. The generalization property and the simple format of U statistics make it quite popular in the application of many common models. However, estimation equation with U statistics structure will be computational intensive when the sample size n or dimension of parameter p is quite large. To address this problem, we explore some new methods for applications of U statistics under these settings.
In the first part, this thesis considers the application of U-type estimation equation in empirical likelihood with a diverging number of parameter. A penalized jackknife empirical likelihood method is proposed to attain a consistent estimator and preserves the computational efficiency at the same time. Selection consistency and Wilk’ theorem are established for the pro- posed estimator in high dimensional setting. Simulation studies and a real study case demonstrate the performance of the proposed approach in both the selection and estimation aspects.
Secondly, this thesis introduces a rank-based batch stochastic gradient accelerated failure time method for the survival data of large scale. The proposed method transfers the original optimization regarding a U statistics of computational complexity order O(n^2) into an iterative update problem, which processes a small batch of data at each time. Scalable inference to the proposed estimator can be derived in a parallel manner, which numerously reduces the computation burden. Relative efficiency is discussed to show the efficiency of the proposed methodology. Simulation studies and a breast cancer study are conducted to show the advantages of the proposed approach over some existing methods.
Last, this thesis considers a general application of two-sample U statistics with 2 degree for the generalized transformation regression model. Traditional solution to this model is the Maximum Rank Correlation estimator. The difficulty lies in the optimization procedure, where the objective U-type function is neither smooth nor continuous, making the computation quite complicated when the sample size is of large scale. A self-induced batch stochastic gradient descent method is suggested, which is extended to the censored data then. The proposed method avoids the dilemma to choose smoothing parameter, meanwhile retains the consistency and asymptotic normality under reasonable amount of smoothing. Simulation experiments involving various models and a real application demonstrate the performance of proposed methodology. |
Degree | Doctor of Philosophy |
Subject | Estimation theory |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/325720 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhao, Na | - |
dc.date.accessioned | 2023-03-02T16:32:17Z | - |
dc.date.available | 2023-03-02T16:32:17Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Zhao, N.. (2021). On U statistics for large scale or high dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/325720 | - |
dc.description.abstract | U statistics is an important structure in the study of both parametric and nonparametric statistics. The generalization property and the simple format of U statistics make it quite popular in the application of many common models. However, estimation equation with U statistics structure will be computational intensive when the sample size n or dimension of parameter p is quite large. To address this problem, we explore some new methods for applications of U statistics under these settings. In the first part, this thesis considers the application of U-type estimation equation in empirical likelihood with a diverging number of parameter. A penalized jackknife empirical likelihood method is proposed to attain a consistent estimator and preserves the computational efficiency at the same time. Selection consistency and Wilk’ theorem are established for the pro- posed estimator in high dimensional setting. Simulation studies and a real study case demonstrate the performance of the proposed approach in both the selection and estimation aspects. Secondly, this thesis introduces a rank-based batch stochastic gradient accelerated failure time method for the survival data of large scale. The proposed method transfers the original optimization regarding a U statistics of computational complexity order O(n^2) into an iterative update problem, which processes a small batch of data at each time. Scalable inference to the proposed estimator can be derived in a parallel manner, which numerously reduces the computation burden. Relative efficiency is discussed to show the efficiency of the proposed methodology. Simulation studies and a breast cancer study are conducted to show the advantages of the proposed approach over some existing methods. Last, this thesis considers a general application of two-sample U statistics with 2 degree for the generalized transformation regression model. Traditional solution to this model is the Maximum Rank Correlation estimator. The difficulty lies in the optimization procedure, where the objective U-type function is neither smooth nor continuous, making the computation quite complicated when the sample size is of large scale. A self-induced batch stochastic gradient descent method is suggested, which is extended to the censored data then. The proposed method avoids the dilemma to choose smoothing parameter, meanwhile retains the consistency and asymptotic normality under reasonable amount of smoothing. Simulation experiments involving various models and a real application demonstrate the performance of proposed methodology. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Estimation theory | - |
dc.title | On U statistics for large scale or high dimensional data | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2022 | - |
dc.identifier.mmsid | 991044649904303414 | - |