File Download
Supplementary

postgraduate thesis: Generalization analysis and regularization in over-parameterized models

TitleGeneralization analysis and regularization in over-parameterized models
Authors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Meng, X. [孟徐然]. (2024). Generalization analysis and regularization in over-parameterized models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractWe study the success of over parameterized models in both regression and classification tasks. In the regression task, we uncover the phenomenon of multiple descent in random feature models, where the test accuracy follows a curve with multiple descents as the number of model parameters increases. In the classification task, we theoretically establish the capability of two-layer ReLU convolutional neural networks to learn complex XOR data. We find that these networks can achieve the Bayes optimal test accuracy when the data signal-to-noise ratio (SNR) is high. Through our theoretical investigations, we discover that benign overfitting only occurs when the data set has a high SNR. Models trained on low SNR data consistently exhibit poor test performance, indicating harmful overfitting of the training data set. We also explore two regularization techniques which can address the issue of harmful overfitting in low SNR data sets for over parameterized models. Firstly, we investigate gradient regularization and its role during the training process. Our theoretical analysis reveals that gradient regularization can effectively suppress the memorization of noise within the model. Consequently, the models with gradient regularization exhibit improved performance in signal learning compared to models without this regularization technique. Secondly, we explore the use of early stopping as a regularization technique. By observing the spectra of weight matrices during the training procedure, researchers identify deviations from the Marchenko-Pastur law. We found that these deviations indicate the presence of sufficient training information or potential issues. As a result, we propose a spectra criterion that can guide the early stopping process during training. Overall, this thesis highlights our investigations into the success of over parameterized models in various learning tasks. We provide insights into the conditions under which these models perform well, and investigate several regularization techniques which can mitigate the harmful overfitting.
DegreeDoctor of Philosophy
SubjectMachine learning
Data mining
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/345401

 

DC FieldValueLanguage
dc.contributor.authorMeng, Xuran-
dc.contributor.author孟徐然-
dc.date.accessioned2024-08-26T08:59:32Z-
dc.date.available2024-08-26T08:59:32Z-
dc.date.issued2024-
dc.identifier.citationMeng, X. [孟徐然]. (2024). Generalization analysis and regularization in over-parameterized models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/345401-
dc.description.abstractWe study the success of over parameterized models in both regression and classification tasks. In the regression task, we uncover the phenomenon of multiple descent in random feature models, where the test accuracy follows a curve with multiple descents as the number of model parameters increases. In the classification task, we theoretically establish the capability of two-layer ReLU convolutional neural networks to learn complex XOR data. We find that these networks can achieve the Bayes optimal test accuracy when the data signal-to-noise ratio (SNR) is high. Through our theoretical investigations, we discover that benign overfitting only occurs when the data set has a high SNR. Models trained on low SNR data consistently exhibit poor test performance, indicating harmful overfitting of the training data set. We also explore two regularization techniques which can address the issue of harmful overfitting in low SNR data sets for over parameterized models. Firstly, we investigate gradient regularization and its role during the training process. Our theoretical analysis reveals that gradient regularization can effectively suppress the memorization of noise within the model. Consequently, the models with gradient regularization exhibit improved performance in signal learning compared to models without this regularization technique. Secondly, we explore the use of early stopping as a regularization technique. By observing the spectra of weight matrices during the training procedure, researchers identify deviations from the Marchenko-Pastur law. We found that these deviations indicate the presence of sufficient training information or potential issues. As a result, we propose a spectra criterion that can guide the early stopping process during training. Overall, this thesis highlights our investigations into the success of over parameterized models in various learning tasks. We provide insights into the conditions under which these models perform well, and investigate several regularization techniques which can mitigate the harmful overfitting.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMachine learning-
dc.subject.lcshData mining-
dc.titleGeneralization analysis and regularization in over-parameterized models-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044843665703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats