File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Unsupervised learning on scientific ocean drill datasets in the South China Sea
Title | Unsupervised learning on scientific ocean drill datasets in the South China Sea |
---|---|
Authors | |
Advisors | |
Issue Date | 2018 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Tse, C. [謝至愷]. (2018). Unsupervised learning on scientific ocean drill datasets in the South China Sea. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | In this interdisciplinary research, unsupervised learning methods are employed to study scientific ocean drilling data of the South China Sea for the first time. A data analysis pipeline consisting of five different unsupervised learning algorithms, K-means, Hierarchical Clustering (HC), Self-Organizing Maps (SOM), Random Forest (RF) and Sparse Autoencoder (SA) is designed to experiment with multivariate geophysical datasets from Ocean Drilling Program (ODP) sites 1146 and 1148 and Integrated Ocean Drilling Program (IODP) sites U1431 and U1433. Compared with conventional methods, unsupervised learning methods do not require any a priori or expert knowledge and has the potential of unveiling data structures previously unknown to traditional analytical methods. Data clusters produced by the five experimented unsupervised learning methods reveal the natural data structure present in the datasets objectively and without any objectivity and presumption. Insights of the relevance of such clusters to the physical world are gained by
comparing them to the existing classification of the drilling cores by lithologic units and geologic time scales against depths below seafloor. The correspondence between the existing classification
and clustering results has demonstrated the applicability of the unsupervised methods to the specific datasets. The pioneering work suggests that unsupervised learning methods originated from
computational data analysis is capable of revealing previously unexplored data patterns within the datasets studied. Clustering results from ODP sites 1146 and 1148 are observed to display a higher correspondence with existing classifications than results from IODP sites U1431 and U1433. As for the unsupervised learning methods, SOM, RF and SA are found to yield a higher Rand Index for
datasets from the same site. Similarity analysis of the datasets as time-series data is also carried out to understand the intrinsic relationship among the datasets in an objective way. The unsupervised learning methodology experimented in this work has laid the groundwork for further machine learning framework that would enable data-driven scientific discovery from ocean drilling data in the future. |
Degree | Doctor of Philosophy |
Subject | Underwater drilling Geophysics - Mathematical models Machine learning |
Dept/Program | Earth Sciences |
Persistent Identifier | http://hdl.handle.net/10722/261495 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lam, EYM | - |
dc.contributor.advisor | Li, Y | - |
dc.contributor.author | Tse, Chi-hoi | - |
dc.contributor.author | 謝至愷 | - |
dc.date.accessioned | 2018-09-20T06:43:56Z | - |
dc.date.available | 2018-09-20T06:43:56Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Tse, C. [謝至愷]. (2018). Unsupervised learning on scientific ocean drill datasets in the South China Sea. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/261495 | - |
dc.description.abstract | In this interdisciplinary research, unsupervised learning methods are employed to study scientific ocean drilling data of the South China Sea for the first time. A data analysis pipeline consisting of five different unsupervised learning algorithms, K-means, Hierarchical Clustering (HC), Self-Organizing Maps (SOM), Random Forest (RF) and Sparse Autoencoder (SA) is designed to experiment with multivariate geophysical datasets from Ocean Drilling Program (ODP) sites 1146 and 1148 and Integrated Ocean Drilling Program (IODP) sites U1431 and U1433. Compared with conventional methods, unsupervised learning methods do not require any a priori or expert knowledge and has the potential of unveiling data structures previously unknown to traditional analytical methods. Data clusters produced by the five experimented unsupervised learning methods reveal the natural data structure present in the datasets objectively and without any objectivity and presumption. Insights of the relevance of such clusters to the physical world are gained by comparing them to the existing classification of the drilling cores by lithologic units and geologic time scales against depths below seafloor. The correspondence between the existing classification and clustering results has demonstrated the applicability of the unsupervised methods to the specific datasets. The pioneering work suggests that unsupervised learning methods originated from computational data analysis is capable of revealing previously unexplored data patterns within the datasets studied. Clustering results from ODP sites 1146 and 1148 are observed to display a higher correspondence with existing classifications than results from IODP sites U1431 and U1433. As for the unsupervised learning methods, SOM, RF and SA are found to yield a higher Rand Index for datasets from the same site. Similarity analysis of the datasets as time-series data is also carried out to understand the intrinsic relationship among the datasets in an objective way. The unsupervised learning methodology experimented in this work has laid the groundwork for further machine learning framework that would enable data-driven scientific discovery from ocean drilling data in the future. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Underwater drilling | - |
dc.subject.lcsh | Geophysics - Mathematical models | - |
dc.subject.lcsh | Machine learning | - |
dc.title | Unsupervised learning on scientific ocean drill datasets in the South China Sea | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Earth Sciences | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044040577903414 | - |
dc.date.hkucongregation | 2018 | - |
dc.identifier.mmsid | 991044040577903414 | - |