File Download
Supplementary

postgraduate thesis: Variable selection and prediction for local polynomial regression

TitleVariable selection and prediction for local polynomial regression
Authors
Advisors
Advisor(s):Lee, SMS
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Cheung, K. Y. [張建熠]. (2020). Variable selection and prediction for local polynomial regression. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractLocal polynomial has been used for nonparametric regression with a long history and is generalised to many other regression models. However, it inescapably suffers from the curse of dimensionality since the size of local neighborhood, which is controlled by bandwidths, decreases exponentially as covariate dimension increases. To solve the problem, we propose a bandwidth regularization scheme to remove irrelevant variables. We first discuss the proposed procedure in the general nonparametric regression setting with convex loss for high dimensional data. Selection consistency as well as the asymptotic property of the local linear estimator based on the optimised bandwidths are established. Then, a modified Nadaraya--Watson estimator is proposed for variable selection under a nonparametric setting, in the presence of missing data, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. The method allows for information sharing across different missing patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missing mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. The final problem is dimension reduction. The above procedure is extended to bandwidth matrix optimization to perform variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Compared to most existing methods, the new procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. The selected model is guaranteed to have non-inferior convergence rate compared to that of the oracle model.
DegreeDoctor of Philosophy
SubjectRegression analysis
Nonparametric statistics
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/297522

 

DC FieldValueLanguage
dc.contributor.advisorLee, SMS-
dc.contributor.authorCheung, Kin Yap-
dc.contributor.author張建熠-
dc.date.accessioned2021-03-21T11:38:01Z-
dc.date.available2021-03-21T11:38:01Z-
dc.date.issued2020-
dc.identifier.citationCheung, K. Y. [張建熠]. (2020). Variable selection and prediction for local polynomial regression. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/297522-
dc.description.abstractLocal polynomial has been used for nonparametric regression with a long history and is generalised to many other regression models. However, it inescapably suffers from the curse of dimensionality since the size of local neighborhood, which is controlled by bandwidths, decreases exponentially as covariate dimension increases. To solve the problem, we propose a bandwidth regularization scheme to remove irrelevant variables. We first discuss the proposed procedure in the general nonparametric regression setting with convex loss for high dimensional data. Selection consistency as well as the asymptotic property of the local linear estimator based on the optimised bandwidths are established. Then, a modified Nadaraya--Watson estimator is proposed for variable selection under a nonparametric setting, in the presence of missing data, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. The method allows for information sharing across different missing patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missing mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. The final problem is dimension reduction. The above procedure is extended to bandwidth matrix optimization to perform variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Compared to most existing methods, the new procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. The selected model is guaranteed to have non-inferior convergence rate compared to that of the oracle model.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshRegression analysis-
dc.subject.lcshNonparametric statistics-
dc.titleVariable selection and prediction for local polynomial regression-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2021-
dc.identifier.mmsid991044351383703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats