File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Clustering for data analysis and privacy preservation in machine learning applications
Title | Clustering for data analysis and privacy preservation in machine learning applications |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhi, Y. [職亞婧]. (2024). Clustering for data analysis and privacy preservation in machine learning applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | In this thesis, we explore the potential of integrating machine learning techniques, such as BERT, Transformer models, and federated learning, for predicting cryptocurrency price trends, specifically Bitcoin, and enabling collaborative and privacy-preserving deep learning for vision tasks. Analyzing almost five years of Reddit data, we investigate the Granger causality link between post volume, post sentiment, and Bitcoin price. Our findings demonstrate that post volume on Reddit better explains price trends than historical prices, emphasizing the importance of considering social media data in financial market predictions.
We further evaluate the effectiveness of incorporating social media data and natural language processing methods, such as the Transformer model, in forecasting market trends. Apply different data clustering methods to process the original social media data in a rolling manner. We showcase the potential of the Transformer architecture in predicting Bitcoin price movements by carefully selecting appropriate clustering methods. Ad- Additionally, we explore the impact of incorporating outliers of social media data into the Transformer model to improve prediction accuracy.
Given the privacy demand in modern applications like reducing the risk of exposing sensitive data, we further study privacy-preserving deep learning. We leverage the strengths of federated learning and Transformer models and apply the clustering techniques to address the challenge of privacy protection in large-scale deep learning. By combining these approaches, we can efficiently improve data privacy protection while enabling the application of powerful large-scale deep learning models.
Our research demonstrates the potential of clustering methods on social media data, Transformer models for forecasting market trends, and clustering federated learning for privacy-preserving deep learning. Overall, our findings underscore the importance of integrating advanced machine learning techniques and social media data in predicting financial market trends and developing privacy-preserving deep learning systems for real-world applications. |
Degree | Doctor of Philosophy |
Subject | Data mining Data privacy Cluster analysis - Data processing Machine learning |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/352639 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhi, Yajing | - |
dc.contributor.author | 職亞婧 | - |
dc.date.accessioned | 2024-12-19T09:26:54Z | - |
dc.date.available | 2024-12-19T09:26:54Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Zhi, Y. [職亞婧]. (2024). Clustering for data analysis and privacy preservation in machine learning applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/352639 | - |
dc.description.abstract | In this thesis, we explore the potential of integrating machine learning techniques, such as BERT, Transformer models, and federated learning, for predicting cryptocurrency price trends, specifically Bitcoin, and enabling collaborative and privacy-preserving deep learning for vision tasks. Analyzing almost five years of Reddit data, we investigate the Granger causality link between post volume, post sentiment, and Bitcoin price. Our findings demonstrate that post volume on Reddit better explains price trends than historical prices, emphasizing the importance of considering social media data in financial market predictions. We further evaluate the effectiveness of incorporating social media data and natural language processing methods, such as the Transformer model, in forecasting market trends. Apply different data clustering methods to process the original social media data in a rolling manner. We showcase the potential of the Transformer architecture in predicting Bitcoin price movements by carefully selecting appropriate clustering methods. Ad- Additionally, we explore the impact of incorporating outliers of social media data into the Transformer model to improve prediction accuracy. Given the privacy demand in modern applications like reducing the risk of exposing sensitive data, we further study privacy-preserving deep learning. We leverage the strengths of federated learning and Transformer models and apply the clustering techniques to address the challenge of privacy protection in large-scale deep learning. By combining these approaches, we can efficiently improve data privacy protection while enabling the application of powerful large-scale deep learning models. Our research demonstrates the potential of clustering methods on social media data, Transformer models for forecasting market trends, and clustering federated learning for privacy-preserving deep learning. Overall, our findings underscore the importance of integrating advanced machine learning techniques and social media data in predicting financial market trends and developing privacy-preserving deep learning systems for real-world applications. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Data mining | - |
dc.subject.lcsh | Data privacy | - |
dc.subject.lcsh | Cluster analysis - Data processing | - |
dc.subject.lcsh | Machine learning | - |
dc.title | Clustering for data analysis and privacy preservation in machine learning applications | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044891406603414 | - |