File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: New algorithms in factor analysis : applications, model selection and findings in bioinformatics

TitleNew algorithms in factor analysis : applications, model selection and findings in bioinformatics
Authors
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wu, H. [胡皓竣]. (2013). New algorithms in factor analysis : applications, model selection and findings in bioinformatics. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5153672
AbstractAdvancements in microelectronic devices and computational and storage technologies enable the collection of high volume, high speed and high dimension data in many applications. Due to the high dimensionality of these measurements, exact dependence of the observations on the various parameters or variables may not be exactly known. Factor analysis (FA) is a useful multivariate technique to exploit the redundancies among observations and reveal their dependence to some latent variables called factors. Some major issues of the conventional FA are high arithmetic complexity for real-time online implementation, assumption of static system parameters, the demand of interval forecasting, robustness against outlying observations and model selection in problems with high dimension but low number of samples (HDLS). This thesis addresses these issues and proposes new extensions to the existing FA algorithms. First, in order to reduce the arithmetic complexity, we propose new recursive FA algorithms (RFA) that recursively compute only the dominant Principal Components (PCs) and eigenvalues in the major subspace tracked by efficient subspace tracking algorithms. Specifically, two new approaches are proposed for updating the PCs and eigenvalues in the classical fault detection problem with different tradeoff between accuracy and arithmetic complexity, namely rank-1 modification and deflation. They significantly reduce the online arithmetic complexity and allow the adaption to time-varying system parameters. Second, we extend the RFA algorithm to forecasting of time series and propose a new recursive dynamic factor analysis (RDFA) algorithm for electricity price forecasting. While the PCs are recursively tracked by the subspace algorithm, a random walk or a state dynamical model can be incorporated to describe the latest state of the time-varying auto-regressive (AR) model built from the factors. This formulation can be solved by the celebrated Kalman filter (KF), which in turn allows future values to be forecasted with estimated confidence intervals. Third, we propose new robust covariance and outlier detection criteria to improve the robustness of the proposed RFA and RDFA algorithms against outlying observations based on the concept of robust M-estimation. Experimental results show that the proposed methods can effectively suppress the adverse contributions of the outliers on the factors and PCs. Finally, in order to improve the consistency of model selection and facilitate the estimation of p-values in HDLS problems, we propose a new automatic model selection method based on ridge partial least squares and recursive feature elimination. Furthermore, a novel performance criterion is proposed for ranking variables according to their consistency of being chosen in different perturbation of the samples. Using this criterion, the associated p-values can be estimated under the HDLS setting. Experimental results using real gene cancer microarray datasets show that improved prognosis can be obtained by the proposed approach as compared with conventional techniques. Furthermore, to quantify their statistical significance, the p-value of the identified genes are estimated and functional analysis of the significant genes found in the diffused large B-cell lymphoma (DLBCL) gene microarray data is performed to validate the findings. While we focus in a few engineering problems, these algorithms are also applicable to other related applications.
DegreeDoctor of Philosophy
SubjectFactor analysis
Bioinformatics - Mathematical models
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/205839

 

DC FieldValueLanguage
dc.contributor.authorWu, Ho-chun-
dc.contributor.author胡皓竣-
dc.date.accessioned2014-10-10T23:13:42Z-
dc.date.available2014-10-10T23:13:42Z-
dc.date.issued2013-
dc.identifier.citationWu, H. [胡皓竣]. (2013). New algorithms in factor analysis : applications, model selection and findings in bioinformatics. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5153672-
dc.identifier.urihttp://hdl.handle.net/10722/205839-
dc.description.abstractAdvancements in microelectronic devices and computational and storage technologies enable the collection of high volume, high speed and high dimension data in many applications. Due to the high dimensionality of these measurements, exact dependence of the observations on the various parameters or variables may not be exactly known. Factor analysis (FA) is a useful multivariate technique to exploit the redundancies among observations and reveal their dependence to some latent variables called factors. Some major issues of the conventional FA are high arithmetic complexity for real-time online implementation, assumption of static system parameters, the demand of interval forecasting, robustness against outlying observations and model selection in problems with high dimension but low number of samples (HDLS). This thesis addresses these issues and proposes new extensions to the existing FA algorithms. First, in order to reduce the arithmetic complexity, we propose new recursive FA algorithms (RFA) that recursively compute only the dominant Principal Components (PCs) and eigenvalues in the major subspace tracked by efficient subspace tracking algorithms. Specifically, two new approaches are proposed for updating the PCs and eigenvalues in the classical fault detection problem with different tradeoff between accuracy and arithmetic complexity, namely rank-1 modification and deflation. They significantly reduce the online arithmetic complexity and allow the adaption to time-varying system parameters. Second, we extend the RFA algorithm to forecasting of time series and propose a new recursive dynamic factor analysis (RDFA) algorithm for electricity price forecasting. While the PCs are recursively tracked by the subspace algorithm, a random walk or a state dynamical model can be incorporated to describe the latest state of the time-varying auto-regressive (AR) model built from the factors. This formulation can be solved by the celebrated Kalman filter (KF), which in turn allows future values to be forecasted with estimated confidence intervals. Third, we propose new robust covariance and outlier detection criteria to improve the robustness of the proposed RFA and RDFA algorithms against outlying observations based on the concept of robust M-estimation. Experimental results show that the proposed methods can effectively suppress the adverse contributions of the outliers on the factors and PCs. Finally, in order to improve the consistency of model selection and facilitate the estimation of p-values in HDLS problems, we propose a new automatic model selection method based on ridge partial least squares and recursive feature elimination. Furthermore, a novel performance criterion is proposed for ranking variables according to their consistency of being chosen in different perturbation of the samples. Using this criterion, the associated p-values can be estimated under the HDLS setting. Experimental results using real gene cancer microarray datasets show that improved prognosis can be obtained by the proposed approach as compared with conventional techniques. Furthermore, to quantify their statistical significance, the p-value of the identified genes are estimated and functional analysis of the significant genes found in the diffused large B-cell lymphoma (DLBCL) gene microarray data is performed to validate the findings. While we focus in a few engineering problems, these algorithms are also applicable to other related applications.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshFactor analysis-
dc.subject.lcshBioinformatics - Mathematical models-
dc.titleNew algorithms in factor analysis : applications, model selection and findings in bioinformatics-
dc.typePG_Thesis-
dc.identifier.hkulb5153672-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5153672-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats