File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Optimal transport in machine learning : analysis and application
| Title | Optimal transport in machine learning : analysis and application |
|---|---|
| Authors | |
| Advisors | Advisor(s):Zhang, Z |
| Issue Date | 2024 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Zhang, J. [張婕]. (2024). Optimal transport in machine learning : analysis and application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | Optimal transport is a powerful tool in the field of machine learning due to its ability of distribution information extraction. This thesis focuses on the practical aspects of optimal transport in machine learning, including mathematical formulations and analysis as well as real-world applications. According to different tasks, this thesis can be divided into three parts.
The first part of this thesis considers the clustering problem, specifically for spectral clustering and subspace clustering. By considering a self optimal transport model within only one group of samples, we observe that both subspace clustering and spectral clustering can be explained in the framework of optimal transport, and the optimal transport matrix bridges the spaces of features and spectral embeddings. Inspired by this connection, we propose a spectral optimal transport barycenter model, which learns spectral embeddings by solving a barycenter problem equipped with an optimal transport discrepancy and guidance of data. Based on our proposed model, we can exploit both feature and structural information involved in data for learning coupled spectral embeddings and affinity matrix in a unified model. Corresponding numerical results show the significance of spectral embeddings learning in spectral clustering.
Hence, we further investigate optimal transport methods for spectral embeddings learning. By enriching sample space and cluster space with similarity matrices and probability measures, we model the similarity relationship of spectral embeddings as a Gromov-Wasserstein barycenter of the measured similarity matrices of samples and clusters. Based on affinity matrix learning and spectrum similarity matrix learning, we propose two methods for spectral embeddings recovery, and show that traditional spectral clustering can be derived from our proposed methods as a special case.
The second part of this thesis is for positive and unlabeled learning. Motivated by the fact that the potential positive samples in the unlabeled training data follow a similar distribution to that of positive labeled training samples, we propose a novel optimal transport model with a regularized marginal distribution to seek samples that distribute similarly to positive labeled samples. By doing this, both positive and negative samples in the unlabeled data can be distinguished, and a traditional binary classification approach can be applied.
The third part of this thesis focuses on applications in windshear detection, which is crucial for aviation safety. We first propose an optimization model for Light Detection and Ranging (LiDAR) data preprocessing, consisting with the data-fitting term, a polar total variation smoothing term, and a signal-to-noise ratio (SNR) weighting term for bad observations filtering, the regularization parameters of which can be automatically selected. Then, we propose two statistical indicators of windshear from the LiDAR PPI scan observational wind velocity data for windshear features construction, one of which is based on windshear property, and the other one is from the LiDAR image texture. Finally, we apply the optimal transport based positive and unlabeled learning method to do windshear detection.
Overall, this thesis contributes to the development of optimal transport in machine learning. The corresponding analysis and numerical results can show the effectiveness of our proposed methods. |
| Degree | Doctor of Philosophy |
| Subject | Transportation problems (Programming) Mathematical optimization Machine learning |
| Dept/Program | Mathematics |
| Persistent Identifier | http://hdl.handle.net/10722/363840 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Zhang, Z | - |
| dc.contributor.author | Zhang, Jie | - |
| dc.contributor.author | 張婕 | - |
| dc.date.accessioned | 2025-10-13T08:11:02Z | - |
| dc.date.available | 2025-10-13T08:11:02Z | - |
| dc.date.issued | 2024 | - |
| dc.identifier.citation | Zhang, J. [張婕]. (2024). Optimal transport in machine learning : analysis and application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/363840 | - |
| dc.description.abstract | Optimal transport is a powerful tool in the field of machine learning due to its ability of distribution information extraction. This thesis focuses on the practical aspects of optimal transport in machine learning, including mathematical formulations and analysis as well as real-world applications. According to different tasks, this thesis can be divided into three parts. The first part of this thesis considers the clustering problem, specifically for spectral clustering and subspace clustering. By considering a self optimal transport model within only one group of samples, we observe that both subspace clustering and spectral clustering can be explained in the framework of optimal transport, and the optimal transport matrix bridges the spaces of features and spectral embeddings. Inspired by this connection, we propose a spectral optimal transport barycenter model, which learns spectral embeddings by solving a barycenter problem equipped with an optimal transport discrepancy and guidance of data. Based on our proposed model, we can exploit both feature and structural information involved in data for learning coupled spectral embeddings and affinity matrix in a unified model. Corresponding numerical results show the significance of spectral embeddings learning in spectral clustering. Hence, we further investigate optimal transport methods for spectral embeddings learning. By enriching sample space and cluster space with similarity matrices and probability measures, we model the similarity relationship of spectral embeddings as a Gromov-Wasserstein barycenter of the measured similarity matrices of samples and clusters. Based on affinity matrix learning and spectrum similarity matrix learning, we propose two methods for spectral embeddings recovery, and show that traditional spectral clustering can be derived from our proposed methods as a special case. The second part of this thesis is for positive and unlabeled learning. Motivated by the fact that the potential positive samples in the unlabeled training data follow a similar distribution to that of positive labeled training samples, we propose a novel optimal transport model with a regularized marginal distribution to seek samples that distribute similarly to positive labeled samples. By doing this, both positive and negative samples in the unlabeled data can be distinguished, and a traditional binary classification approach can be applied. The third part of this thesis focuses on applications in windshear detection, which is crucial for aviation safety. We first propose an optimization model for Light Detection and Ranging (LiDAR) data preprocessing, consisting with the data-fitting term, a polar total variation smoothing term, and a signal-to-noise ratio (SNR) weighting term for bad observations filtering, the regularization parameters of which can be automatically selected. Then, we propose two statistical indicators of windshear from the LiDAR PPI scan observational wind velocity data for windshear features construction, one of which is based on windshear property, and the other one is from the LiDAR image texture. Finally, we apply the optimal transport based positive and unlabeled learning method to do windshear detection. Overall, this thesis contributes to the development of optimal transport in machine learning. The corresponding analysis and numerical results can show the effectiveness of our proposed methods. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Transportation problems (Programming) | - |
| dc.subject.lcsh | Mathematical optimization | - |
| dc.subject.lcsh | Machine learning | - |
| dc.title | Optimal transport in machine learning : analysis and application | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Mathematics | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2024 | - |
| dc.identifier.mmsid | 991044860749903414 | - |
