File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Contributions to high-dimensional statistical analysis

TitleContributions to high-dimensional statistical analysis
Authors
Issue Date2016
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Li, Z. [李兆媛]. (2016). Contributions to high-dimensional statistical analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractIn this thesis, for several important high-dimensional problems where the dimension is large in comparison with the sample size, new methodologies are investigated with new limiting results and meaningful applications. In the first problem, I generalise two simple but effective procedures, the determinant-based and trace-based criteria, to general populations for high-dimensional classification. Their asymptotic misclassification probabilities are derived using the theory of large dimensional random matrices. One of main results is that the misclassification probability cannot vanish even if the sample size become very large for some situations. The performance of these two criteria are explored for various structures of mean vector and covariance. In the second problem, I study the question of testing independence between two large sets of variates. The main application here is to infer gene regulatory networks from gene expression data for normal and diseased populations, respectively. The networks are constructed by testing independence between pairs of genes and the test statistic is constructed from trace of a suitable large random matrix. Compared to traditional statistical methods, this new method successfully identifies important connections of genes in normal and diseased samples, respectively. In the third problem, I develop new statistical theory for probabilistic principal component analysis models in high dimensions. An accurate estimator of the noise variance is proposed. By using random-matrix theory, the asymptotic normalities of this estimator are established for Gaussian and non-Gaussian cases, respectively. In addition, based on this new estimator of noise variance, I develop several important applications including constructing new criterion of determining the number of principal components and deriving new asymptotics for the related goodness-of-fit statistic. In the last problem, I propose new tests to detect the existence of heteroscedasticity in high-dimensional linear regression. Using the theory of random Haar orthogonal matrices, the asymptotic normalities of statistics are obtained under the null and the assumption that the degree of freedom of model tends to infinity. These new tests are dimension-proof, which guarantees a wide applicability of them to different combinations of sample size and dimension. Extensive Monte-Carlo experiments and real data analyses demonstrate the superiority of our proposed tests over traditional methods in terms of size and power.
DegreeDoctor of Philosophy
SubjectMathematical statistics
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/235899
HKU Library Item IDb5801645

 

DC FieldValueLanguage
dc.contributor.authorLi, Zhaoyuan-
dc.contributor.author李兆媛-
dc.date.accessioned2016-11-09T23:26:59Z-
dc.date.available2016-11-09T23:26:59Z-
dc.date.issued2016-
dc.identifier.citationLi, Z. [李兆媛]. (2016). Contributions to high-dimensional statistical analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/235899-
dc.description.abstractIn this thesis, for several important high-dimensional problems where the dimension is large in comparison with the sample size, new methodologies are investigated with new limiting results and meaningful applications. In the first problem, I generalise two simple but effective procedures, the determinant-based and trace-based criteria, to general populations for high-dimensional classification. Their asymptotic misclassification probabilities are derived using the theory of large dimensional random matrices. One of main results is that the misclassification probability cannot vanish even if the sample size become very large for some situations. The performance of these two criteria are explored for various structures of mean vector and covariance. In the second problem, I study the question of testing independence between two large sets of variates. The main application here is to infer gene regulatory networks from gene expression data for normal and diseased populations, respectively. The networks are constructed by testing independence between pairs of genes and the test statistic is constructed from trace of a suitable large random matrix. Compared to traditional statistical methods, this new method successfully identifies important connections of genes in normal and diseased samples, respectively. In the third problem, I develop new statistical theory for probabilistic principal component analysis models in high dimensions. An accurate estimator of the noise variance is proposed. By using random-matrix theory, the asymptotic normalities of this estimator are established for Gaussian and non-Gaussian cases, respectively. In addition, based on this new estimator of noise variance, I develop several important applications including constructing new criterion of determining the number of principal components and deriving new asymptotics for the related goodness-of-fit statistic. In the last problem, I propose new tests to detect the existence of heteroscedasticity in high-dimensional linear regression. Using the theory of random Haar orthogonal matrices, the asymptotic normalities of statistics are obtained under the null and the assumption that the degree of freedom of model tends to infinity. These new tests are dimension-proof, which guarantees a wide applicability of them to different combinations of sample size and dimension. Extensive Monte-Carlo experiments and real data analyses demonstrate the superiority of our proposed tests over traditional methods in terms of size and power.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshMathematical statistics-
dc.titleContributions to high-dimensional statistical analysis-
dc.typePG_Thesis-
dc.identifier.hkulb5801645-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5801645-
dc.identifier.mmsid991020813029703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats