Optimal transport in machine learning : analysis and application

Zhang, Jie; 張婕

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Mathematics: Theses

postgraduate thesis: Optimal transport in machine learning : analysis and application

Title	Optimal transport in machine learning : analysis and application
Authors	Zhang, Jie 張婕
Advisors	Advisor(s):Zhang, Z
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhang, J. [張婕]. (2024). Optimal transport in machine learning : analysis and application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Optimal transport is a powerful tool in the field of machine learning due to its ability of distribution information extraction. This thesis focuses on the practical aspects of optimal transport in machine learning, including mathematical formulations and analysis as well as real-world applications. According to different tasks, this thesis can be divided into three parts. The first part of this thesis considers the clustering problem, specifically for spectral clustering and subspace clustering. By considering a self optimal transport model within only one group of samples, we observe that both subspace clustering and spectral clustering can be explained in the framework of optimal transport, and the optimal transport matrix bridges the spaces of features and spectral embeddings. Inspired by this connection, we propose a spectral optimal transport barycenter model, which learns spectral embeddings by solving a barycenter problem equipped with an optimal transport discrepancy and guidance of data. Based on our proposed model, we can exploit both feature and structural information involved in data for learning coupled spectral embeddings and affinity matrix in a unified model. Corresponding numerical results show the significance of spectral embeddings learning in spectral clustering. Hence, we further investigate optimal transport methods for spectral embeddings learning. By enriching sample space and cluster space with similarity matrices and probability measures, we model the similarity relationship of spectral embeddings as a Gromov-Wasserstein barycenter of the measured similarity matrices of samples and clusters. Based on affinity matrix learning and spectrum similarity matrix learning, we propose two methods for spectral embeddings recovery, and show that traditional spectral clustering can be derived from our proposed methods as a special case. The second part of this thesis is for positive and unlabeled learning. Motivated by the fact that the potential positive samples in the unlabeled training data follow a similar distribution to that of positive labeled training samples, we propose a novel optimal transport model with a regularized marginal distribution to seek samples that distribute similarly to positive labeled samples. By doing this, both positive and negative samples in the unlabeled data can be distinguished, and a traditional binary classification approach can be applied. The third part of this thesis focuses on applications in windshear detection, which is crucial for aviation safety. We first propose an optimization model for Light Detection and Ranging (LiDAR) data preprocessing, consisting with the data-fitting term, a polar total variation smoothing term, and a signal-to-noise ratio (SNR) weighting term for bad observations filtering, the regularization parameters of which can be automatically selected. Then, we propose two statistical indicators of windshear from the LiDAR PPI scan observational wind velocity data for windshear features construction, one of which is based on windshear property, and the other one is from the LiDAR image texture. Finally, we apply the optimal transport based positive and unlabeled learning method to do windshear detection. Overall, this thesis contributes to the development of optimal transport in machine learning. The corresponding analysis and numerical results can show the effectiveness of our proposed methods.
Degree	Doctor of Philosophy
Subject	Transportation problems (Programming) Mathematical optimization Machine learning
Dept/Program	Mathematics
Persistent Identifier	http://hdl.handle.net/10722/363840

DC Field	Value	Language
dc.contributor.advisor	Zhang, Z	-
dc.contributor.author	Zhang, Jie	-
dc.contributor.author	張婕	-
dc.date.accessioned	2025-10-13T08:11:02Z	-
dc.date.available	2025-10-13T08:11:02Z	-
dc.date.issued	2024	-
dc.identifier.citation	Zhang, J. [張婕]. (2024). Optimal transport in machine learning : analysis and application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/363840	-
dc.description.abstract	Optimal transport is a powerful tool in the field of machine learning due to its ability of distribution information extraction. This thesis focuses on the practical aspects of optimal transport in machine learning, including mathematical formulations and analysis as well as real-world applications. According to different tasks, this thesis can be divided into three parts. The first part of this thesis considers the clustering problem, specifically for spectral clustering and subspace clustering. By considering a self optimal transport model within only one group of samples, we observe that both subspace clustering and spectral clustering can be explained in the framework of optimal transport, and the optimal transport matrix bridges the spaces of features and spectral embeddings. Inspired by this connection, we propose a spectral optimal transport barycenter model, which learns spectral embeddings by solving a barycenter problem equipped with an optimal transport discrepancy and guidance of data. Based on our proposed model, we can exploit both feature and structural information involved in data for learning coupled spectral embeddings and affinity matrix in a unified model. Corresponding numerical results show the significance of spectral embeddings learning in spectral clustering. Hence, we further investigate optimal transport methods for spectral embeddings learning. By enriching sample space and cluster space with similarity matrices and probability measures, we model the similarity relationship of spectral embeddings as a Gromov-Wasserstein barycenter of the measured similarity matrices of samples and clusters. Based on affinity matrix learning and spectrum similarity matrix learning, we propose two methods for spectral embeddings recovery, and show that traditional spectral clustering can be derived from our proposed methods as a special case. The second part of this thesis is for positive and unlabeled learning. Motivated by the fact that the potential positive samples in the unlabeled training data follow a similar distribution to that of positive labeled training samples, we propose a novel optimal transport model with a regularized marginal distribution to seek samples that distribute similarly to positive labeled samples. By doing this, both positive and negative samples in the unlabeled data can be distinguished, and a traditional binary classification approach can be applied. The third part of this thesis focuses on applications in windshear detection, which is crucial for aviation safety. We first propose an optimization model for Light Detection and Ranging (LiDAR) data preprocessing, consisting with the data-fitting term, a polar total variation smoothing term, and a signal-to-noise ratio (SNR) weighting term for bad observations filtering, the regularization parameters of which can be automatically selected. Then, we propose two statistical indicators of windshear from the LiDAR PPI scan observational wind velocity data for windshear features construction, one of which is based on windshear property, and the other one is from the LiDAR image texture. Finally, we apply the optimal transport based positive and unlabeled learning method to do windshear detection. Overall, this thesis contributes to the development of optimal transport in machine learning. The corresponding analysis and numerical results can show the effectiveness of our proposed methods.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Transportation problems (Programming)	-
dc.subject.lcsh	Mathematical optimization	-
dc.subject.lcsh	Machine learning	-
dc.title	Optimal transport in machine learning : analysis and application	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Mathematics	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044860749903414	-

File Download

Supplementary

postgraduate thesis: Optimal transport in machine learning : analysis and application

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats