Clustering for data analysis and privacy preservation in machine learning applications

Zhi, Yajing; 職亞婧

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Clustering for data analysis and privacy preservation in machine learning applications

Title	Clustering for data analysis and privacy preservation in machine learning applications
Authors	Zhi, Yajing 職亞婧
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhi, Y. [職亞婧]. (2024). Clustering for data analysis and privacy preservation in machine learning applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	In this thesis, we explore the potential of integrating machine learning techniques, such as BERT, Transformer models, and federated learning, for predicting cryptocurrency price trends, specifically Bitcoin, and enabling collaborative and privacy-preserving deep learning for vision tasks. Analyzing almost five years of Reddit data, we investigate the Granger causality link between post volume, post sentiment, and Bitcoin price. Our findings demonstrate that post volume on Reddit better explains price trends than historical prices, emphasizing the importance of considering social media data in financial market predictions. We further evaluate the effectiveness of incorporating social media data and natural language processing methods, such as the Transformer model, in forecasting market trends. Apply different data clustering methods to process the original social media data in a rolling manner. We showcase the potential of the Transformer architecture in predicting Bitcoin price movements by carefully selecting appropriate clustering methods. Ad- Additionally, we explore the impact of incorporating outliers of social media data into the Transformer model to improve prediction accuracy. Given the privacy demand in modern applications like reducing the risk of exposing sensitive data, we further study privacy-preserving deep learning. We leverage the strengths of federated learning and Transformer models and apply the clustering techniques to address the challenge of privacy protection in large-scale deep learning. By combining these approaches, we can efficiently improve data privacy protection while enabling the application of powerful large-scale deep learning models. Our research demonstrates the potential of clustering methods on social media data, Transformer models for forecasting market trends, and clustering federated learning for privacy-preserving deep learning. Overall, our findings underscore the importance of integrating advanced machine learning techniques and social media data in predicting financial market trends and developing privacy-preserving deep learning systems for real-world applications.
Degree	Doctor of Philosophy
Subject	Data mining Data privacy Cluster analysis - Data processing Machine learning
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/352639

DC Field	Value	Language
dc.contributor.author	Zhi, Yajing	-
dc.contributor.author	職亞婧	-
dc.date.accessioned	2024-12-19T09:26:54Z	-
dc.date.available	2024-12-19T09:26:54Z	-
dc.date.issued	2024	-
dc.identifier.citation	Zhi, Y. [職亞婧]. (2024). Clustering for data analysis and privacy preservation in machine learning applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/352639	-
dc.description.abstract	In this thesis, we explore the potential of integrating machine learning techniques, such as BERT, Transformer models, and federated learning, for predicting cryptocurrency price trends, specifically Bitcoin, and enabling collaborative and privacy-preserving deep learning for vision tasks. Analyzing almost five years of Reddit data, we investigate the Granger causality link between post volume, post sentiment, and Bitcoin price. Our findings demonstrate that post volume on Reddit better explains price trends than historical prices, emphasizing the importance of considering social media data in financial market predictions. We further evaluate the effectiveness of incorporating social media data and natural language processing methods, such as the Transformer model, in forecasting market trends. Apply different data clustering methods to process the original social media data in a rolling manner. We showcase the potential of the Transformer architecture in predicting Bitcoin price movements by carefully selecting appropriate clustering methods. Ad- Additionally, we explore the impact of incorporating outliers of social media data into the Transformer model to improve prediction accuracy. Given the privacy demand in modern applications like reducing the risk of exposing sensitive data, we further study privacy-preserving deep learning. We leverage the strengths of federated learning and Transformer models and apply the clustering techniques to address the challenge of privacy protection in large-scale deep learning. By combining these approaches, we can efficiently improve data privacy protection while enabling the application of powerful large-scale deep learning models. Our research demonstrates the potential of clustering methods on social media data, Transformer models for forecasting market trends, and clustering federated learning for privacy-preserving deep learning. Overall, our findings underscore the importance of integrating advanced machine learning techniques and social media data in predicting financial market trends and developing privacy-preserving deep learning systems for real-world applications.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Data mining	-
dc.subject.lcsh	Data privacy	-
dc.subject.lcsh	Cluster analysis - Data processing	-
dc.subject.lcsh	Machine learning	-
dc.title	Clustering for data analysis and privacy preservation in machine learning applications	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044891406603414	-

File Download

Supplementary

postgraduate thesis: Clustering for data analysis and privacy preservation in machine learning applications

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats