File Download
Supplementary

Citations:
 Appears in Collections:
postgraduate thesis: Spike detection : random matrix theory applications on highdimensional data
Title  Spike detection : random matrix theory applications on highdimensional data 

Authors  
Issue Date  2022 
Publisher  The University of Hong Kong (Pokfulam, Hong Kong) 
Citation  Xu, Y. [徐毓阳]. (2022). Spike detection : random matrix theory applications on highdimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. 
Abstract  Spike detection using Random Matrix Theory (RMT) on highdimensional data are investigated in this thesis.
In the first part, we study a “mysterious” phase transition phenomenon raised by Nakatsukasa et al. (2013) in the spectra of the graph Laplacian matrices of dendrite graphs from biological experiments on mouse's retinal ganglion cells. While the bulk of the spectrum can be well understood by structures resembling starlike trees, mysteries about the spikes, that is, isolated eigenvalues outside the bulk spectrum, remain unexplained. We bring new insights to these mysteries by considering a class of uniform trees. Exact relationships between the number of such spikes and the number of Tjunctions are analyzed in function of the number of vertices separating the Tjunctions. Using these theoretical results, predictions are proposed for the number of spikes observed in reallife dendrite graphs. Interestingly enough, these predictions match well the observed numbers of spikes, thus confirm the practical value of our theoretical results.
In the second part, we introduce a method called ERStruct to estimate the number of top informative PCs in whole genome sequencing data accounting for complicated LD structure between genetic markers. There are two important issues regarding the traditional method by Patterson, Price, and Reich (2006). First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sampletomarker ratio n/p is nearly zero, violating the assumption of the TracyWidom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative principal components based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method. 
Degree  Doctor of Philosophy 
Subject  Random matrices 
Dept/Program  Statistics and Actuarial Science 
Persistent Identifier  http://hdl.handle.net/10722/325819 
DC Field  Value  Language 

dc.contributor.author  Xu, Yuyang   
dc.contributor.author  徐毓阳   
dc.date.accessioned  20230302T16:33:05Z   
dc.date.available  20230302T16:33:05Z   
dc.date.issued  2022   
dc.identifier.citation  Xu, Y. [徐毓阳]. (2022). Spike detection : random matrix theory applications on highdimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.   
dc.identifier.uri  http://hdl.handle.net/10722/325819   
dc.description.abstract  Spike detection using Random Matrix Theory (RMT) on highdimensional data are investigated in this thesis. In the first part, we study a “mysterious” phase transition phenomenon raised by Nakatsukasa et al. (2013) in the spectra of the graph Laplacian matrices of dendrite graphs from biological experiments on mouse's retinal ganglion cells. While the bulk of the spectrum can be well understood by structures resembling starlike trees, mysteries about the spikes, that is, isolated eigenvalues outside the bulk spectrum, remain unexplained. We bring new insights to these mysteries by considering a class of uniform trees. Exact relationships between the number of such spikes and the number of Tjunctions are analyzed in function of the number of vertices separating the Tjunctions. Using these theoretical results, predictions are proposed for the number of spikes observed in reallife dendrite graphs. Interestingly enough, these predictions match well the observed numbers of spikes, thus confirm the practical value of our theoretical results. In the second part, we introduce a method called ERStruct to estimate the number of top informative PCs in whole genome sequencing data accounting for complicated LD structure between genetic markers. There are two important issues regarding the traditional method by Patterson, Price, and Reich (2006). First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sampletomarker ratio n/p is nearly zero, violating the assumption of the TracyWidom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative principal components based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method.   
dc.language  eng   
dc.publisher  The University of Hong Kong (Pokfulam, Hong Kong)   
dc.relation.ispartof  HKU Theses Online (HKUTO)   
dc.rights  The author retains all proprietary rights, (such as patent rights) and the right to use in future works.   
dc.rights  This work is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License.   
dc.subject.lcsh  Random matrices   
dc.title  Spike detection : random matrix theory applications on highdimensional data   
dc.type  PG_Thesis   
dc.description.thesisname  Doctor of Philosophy   
dc.description.thesislevel  Doctoral   
dc.description.thesisdiscipline  Statistics and Actuarial Science   
dc.description.nature  published_or_final_version   
dc.date.hkucongregation  2022   
dc.identifier.mmsid  991044649902603414   