File Download
Supplementary

postgraduate thesis: The sample size design in ultrahigh dimensional regression models and extensive simulation validation

TitleThe sample size design in ultrahigh dimensional regression models and extensive simulation validation
Authors
Issue Date2016
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Duan, Z. [段振辰]. (2016). The sample size design in ultrahigh dimensional regression models and extensive simulation validation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractMost of classical regression modeling methods are based on correlation learning. In ultrahigh dimensional scenarios, the correlations between the response variable and all potential predictors are affected by a phenomenon called spurious correlation, which means the squared correlation between the response variable and an independent predictor could still be significantly large. This dissertation proposes solving the problem by introducing sample size design methods. We start from a theoretical explanation of the spurious correlation phenomenon, based on its true nature according to the independence structure of the sample correlation matrix obtained by Prof. K.W. Ng. By applying the nature of spurious correlation, we show that it can be expressed with a specific distribution relating to the sample size and the number of independent predictors. Therefore, we create a critical sample size design method with which people can control the spurious correlations in any required level with a minimum required sample size. Considering the false predictors could still be correlated in real applications, we show that the critical sample size is still effective and brings a safer result. Further, we generalize the sample size design method to protect or help the true predictors from the independent false predictors. Rather than controlling the spurious correlation under a constant level, we ensure every true predictor show higher squared correlation with the response variable than all the false predictors do. In this dissertation, an exact solution to critical sample size for the scenario with one true predictor is provided. And for more than one predictors cases, we provide a safer sample size and a detailed tuning guidance for users. Lastly, a modified sampling algorithm, named as Dynamic Sampling Importance Resampling algorithm (D-SIR), is proposed. We modify the classical Sampling Importance Resampling algorithm by introducing a dynamic grouping mechanism. Our algorithm increases the sampling efficiency significantly, making the sampling time no longer linear with the sample size needed, while the quality of the sample is not necessarily compromised. With the algorithm, a required sample size can be achieved far more easily.
DegreeDoctor of Philosophy
SubjectRegression analysis
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/238346
HKU Library Item IDb5824357

 

DC FieldValueLanguage
dc.contributor.authorDuan, Zhenchen-
dc.contributor.author段振辰-
dc.date.accessioned2017-02-10T07:29:33Z-
dc.date.available2017-02-10T07:29:33Z-
dc.date.issued2016-
dc.identifier.citationDuan, Z. [段振辰]. (2016). The sample size design in ultrahigh dimensional regression models and extensive simulation validation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/238346-
dc.description.abstractMost of classical regression modeling methods are based on correlation learning. In ultrahigh dimensional scenarios, the correlations between the response variable and all potential predictors are affected by a phenomenon called spurious correlation, which means the squared correlation between the response variable and an independent predictor could still be significantly large. This dissertation proposes solving the problem by introducing sample size design methods. We start from a theoretical explanation of the spurious correlation phenomenon, based on its true nature according to the independence structure of the sample correlation matrix obtained by Prof. K.W. Ng. By applying the nature of spurious correlation, we show that it can be expressed with a specific distribution relating to the sample size and the number of independent predictors. Therefore, we create a critical sample size design method with which people can control the spurious correlations in any required level with a minimum required sample size. Considering the false predictors could still be correlated in real applications, we show that the critical sample size is still effective and brings a safer result. Further, we generalize the sample size design method to protect or help the true predictors from the independent false predictors. Rather than controlling the spurious correlation under a constant level, we ensure every true predictor show higher squared correlation with the response variable than all the false predictors do. In this dissertation, an exact solution to critical sample size for the scenario with one true predictor is provided. And for more than one predictors cases, we provide a safer sample size and a detailed tuning guidance for users. Lastly, a modified sampling algorithm, named as Dynamic Sampling Importance Resampling algorithm (D-SIR), is proposed. We modify the classical Sampling Importance Resampling algorithm by introducing a dynamic grouping mechanism. Our algorithm increases the sampling efficiency significantly, making the sampling time no longer linear with the sample size needed, while the quality of the sample is not necessarily compromised. With the algorithm, a required sample size can be achieved far more easily. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshRegression analysis-
dc.titleThe sample size design in ultrahigh dimensional regression models and extensive simulation validation-
dc.typePG_Thesis-
dc.identifier.hkulb5824357-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.identifier.mmsid991021210489703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats