Modeling sequential dependence in statistics and machine learning

Huang, Feiqing

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Statistics & Actuarial Science: Theses

postgraduate thesis: Modeling sequential dependence in statistics and machine learning

Title	Modeling sequential dependence in statistics and machine learning
Authors	Huang, Feiqing
Advisors	Advisor(s):Li, G
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Huang, F.. (2023). Modeling sequential dependence in statistics and machine learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	In the era of big data, sequential data modeling has a wide scope of applications ranging from weather forecast, energy and stock markets prediction to music generation and machine translation. The key to modeling sequential data lies in learning the sequential dependence aka serial dependence therein. It has attracted the attention from both statisticians and machine learning practitioners, and many models have been proposed to tackle large sequential data. In statistics, models such as VAR and VARMA are commonly adopted to analyze high-dimensional time series. On the other hand, machine learning algorithms apply deep neural networks, including recurrent networks and Transformer-based models, to learn from audio, video or language datasets. This thesis proposes methods from statistical and machine learning viewpoints, to capture the sequential dependence with improved forecasting performance and sample efficiency against other existing methods. In the first part of this thesis, a general framework is proposed for modeling high-dimensional low-rank linear time series. Specifically, we develop the estimation method and algorithm for high-dimensional general linear processes (GLP), with detailed statistical and convergence analysis. Albeit being the most general linear time series model, GLP has not yet been studied systematically in the high-dimensional literature. This thesis contributes to filling this gap. Simulations are conducted to verify the theoretical results, and empirical studies demonstrate the usefulness of the proposed method. Secondly, this thesis introduces a novel VARMA variant, which not only preserves the parsimony and rich temporal dependence structures of VARMA, but also avoids its two notorious drawbacks, i.e. non-identifiability and computational intractability, even for moderate-dimensional data. Moreover, its parameter estimation is scalable with respect to the complexity of temporal dependence, namely the number of decay patterns constituting the autoregressive structure; hence it is called the scalable ARMA (SARMA) model. In the high-dimensional setup, we further impose a low-Tucker-rank assumption on the coefficient tensor of the proposed model. This further leads to desirable dynamic factor interpretations, making the model especially suited for financial and economic data. We derive non-asymptotic error bounds for the proposed estimator and propose a tractable alternating least squares algorithm. Theoretical and computational properties of the proposed method are verified by simulation studies, and the advantages over existing methods are illustrated in real applications. Lastly, inspired by derivation of the SARMA model, this thesis applies the same technique to rewrite an RNN layer into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). This motivates a solution to seamlessly incorporate recurrent dynamics of an RNN into a Transformer, leading to the newly proposed Self-Attention with Recurrence (RSA) module. The proposed module can leverage the recurrent inductive bias of REMs to model the recurrent signals, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, which is the key to a better sample efficiency than its corresponding baseline Transformer. The effectiveness of RSA modules are demonstrated by four sequential learning tasks in machine learning.
Degree	Doctor of Philosophy
Subject	Sequential analysis Machine learning
Dept/Program	Statistics and Actuarial Science
Persistent Identifier	http://hdl.handle.net/10722/328600

DC Field	Value	Language
dc.contributor.advisor	Li, G	-
dc.contributor.author	Huang, Feiqing	-
dc.date.accessioned	2023-06-29T05:44:35Z	-
dc.date.available	2023-06-29T05:44:35Z	-
dc.date.issued	2023	-
dc.identifier.citation	Huang, F.. (2023). Modeling sequential dependence in statistics and machine learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/328600	-
dc.description.abstract	In the era of big data, sequential data modeling has a wide scope of applications ranging from weather forecast, energy and stock markets prediction to music generation and machine translation. The key to modeling sequential data lies in learning the sequential dependence aka serial dependence therein. It has attracted the attention from both statisticians and machine learning practitioners, and many models have been proposed to tackle large sequential data. In statistics, models such as VAR and VARMA are commonly adopted to analyze high-dimensional time series. On the other hand, machine learning algorithms apply deep neural networks, including recurrent networks and Transformer-based models, to learn from audio, video or language datasets. This thesis proposes methods from statistical and machine learning viewpoints, to capture the sequential dependence with improved forecasting performance and sample efficiency against other existing methods. In the first part of this thesis, a general framework is proposed for modeling high-dimensional low-rank linear time series. Specifically, we develop the estimation method and algorithm for high-dimensional general linear processes (GLP), with detailed statistical and convergence analysis. Albeit being the most general linear time series model, GLP has not yet been studied systematically in the high-dimensional literature. This thesis contributes to filling this gap. Simulations are conducted to verify the theoretical results, and empirical studies demonstrate the usefulness of the proposed method. Secondly, this thesis introduces a novel VARMA variant, which not only preserves the parsimony and rich temporal dependence structures of VARMA, but also avoids its two notorious drawbacks, i.e. non-identifiability and computational intractability, even for moderate-dimensional data. Moreover, its parameter estimation is scalable with respect to the complexity of temporal dependence, namely the number of decay patterns constituting the autoregressive structure; hence it is called the scalable ARMA (SARMA) model. In the high-dimensional setup, we further impose a low-Tucker-rank assumption on the coefficient tensor of the proposed model. This further leads to desirable dynamic factor interpretations, making the model especially suited for financial and economic data. We derive non-asymptotic error bounds for the proposed estimator and propose a tractable alternating least squares algorithm. Theoretical and computational properties of the proposed method are verified by simulation studies, and the advantages over existing methods are illustrated in real applications. Lastly, inspired by derivation of the SARMA model, this thesis applies the same technique to rewrite an RNN layer into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). This motivates a solution to seamlessly incorporate recurrent dynamics of an RNN into a Transformer, leading to the newly proposed Self-Attention with Recurrence (RSA) module. The proposed module can leverage the recurrent inductive bias of REMs to model the recurrent signals, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, which is the key to a better sample efficiency than its corresponding baseline Transformer. The effectiveness of RSA modules are demonstrated by four sequential learning tasks in machine learning.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Sequential analysis	-
dc.subject.lcsh	Machine learning	-
dc.title	Modeling sequential dependence in statistics and machine learning	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Statistics and Actuarial Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044695780603414	-

File Download

Supplementary

postgraduate thesis: Modeling sequential dependence in statistics and machine learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats