Variable selection and prediction for local polynomial regression

Cheung, Kin Yap; 張建熠

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Statistics & Actuarial Science: Theses

postgraduate thesis: Variable selection and prediction for local polynomial regression

Title	Variable selection and prediction for local polynomial regression
Authors	Cheung, Kin Yap 張建熠
Advisors	Advisor(s):Lee, SMS
Issue Date	2020
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Cheung, K. Y. [張建熠]. (2020). Variable selection and prediction for local polynomial regression. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Local polynomial has been used for nonparametric regression with a long history and is generalised to many other regression models. However, it inescapably suffers from the curse of dimensionality since the size of local neighborhood, which is controlled by bandwidths, decreases exponentially as covariate dimension increases. To solve the problem, we propose a bandwidth regularization scheme to remove irrelevant variables. We first discuss the proposed procedure in the general nonparametric regression setting with convex loss for high dimensional data. Selection consistency as well as the asymptotic property of the local linear estimator based on the optimised bandwidths are established. Then, a modified Nadaraya--Watson estimator is proposed for variable selection under a nonparametric setting, in the presence of missing data, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. The method allows for information sharing across different missing patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missing mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. The final problem is dimension reduction. The above procedure is extended to bandwidth matrix optimization to perform variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Compared to most existing methods, the new procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. The selected model is guaranteed to have non-inferior convergence rate compared to that of the oracle model.
Degree	Doctor of Philosophy
Subject	Regression analysis Nonparametric statistics
Dept/Program	Statistics and Actuarial Science
Persistent Identifier	http://hdl.handle.net/10722/297522

DC Field	Value	Language
dc.contributor.advisor	Lee, SMS	-
dc.contributor.author	Cheung, Kin Yap	-
dc.contributor.author	張建熠	-
dc.date.accessioned	2021-03-21T11:38:01Z	-
dc.date.available	2021-03-21T11:38:01Z	-
dc.date.issued	2020	-
dc.identifier.citation	Cheung, K. Y. [張建熠]. (2020). Variable selection and prediction for local polynomial regression. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/297522	-
dc.description.abstract	Local polynomial has been used for nonparametric regression with a long history and is generalised to many other regression models. However, it inescapably suffers from the curse of dimensionality since the size of local neighborhood, which is controlled by bandwidths, decreases exponentially as covariate dimension increases. To solve the problem, we propose a bandwidth regularization scheme to remove irrelevant variables. We first discuss the proposed procedure in the general nonparametric regression setting with convex loss for high dimensional data. Selection consistency as well as the asymptotic property of the local linear estimator based on the optimised bandwidths are established. Then, a modified Nadaraya--Watson estimator is proposed for variable selection under a nonparametric setting, in the presence of missing data, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. The method allows for information sharing across different missing patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missing mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. The final problem is dimension reduction. The above procedure is extended to bandwidth matrix optimization to perform variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Compared to most existing methods, the new procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. The selected model is guaranteed to have non-inferior convergence rate compared to that of the oracle model.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Regression analysis	-
dc.subject.lcsh	Nonparametric statistics	-
dc.title	Variable selection and prediction for local polynomial regression	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Statistics and Actuarial Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2021	-
dc.identifier.mmsid	991044351383703414	-

File Download

Supplementary

postgraduate thesis: Variable selection and prediction for local polynomial regression

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats