File Download
Supplementary

postgraduate thesis: Clustering for data analysis and privacy preservation in machine learning applications

TitleClustering for data analysis and privacy preservation in machine learning applications
Authors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhi, Y. [職亞婧]. (2024). Clustering for data analysis and privacy preservation in machine learning applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractIn this thesis, we explore the potential of integrating machine learning techniques, such as BERT, Transformer models, and federated learning, for predicting cryptocurrency price trends, specifically Bitcoin, and enabling collaborative and privacy-preserving deep learning for vision tasks. Analyzing almost five years of Reddit data, we investigate the Granger causality link between post volume, post sentiment, and Bitcoin price. Our findings demonstrate that post volume on Reddit better explains price trends than historical prices, emphasizing the importance of considering social media data in financial market predictions. We further evaluate the effectiveness of incorporating social media data and natural language processing methods, such as the Transformer model, in forecasting market trends. Apply different data clustering methods to process the original social media data in a rolling manner. We showcase the potential of the Transformer architecture in predicting Bitcoin price movements by carefully selecting appropriate clustering methods. Ad- Additionally, we explore the impact of incorporating outliers of social media data into the Transformer model to improve prediction accuracy. Given the privacy demand in modern applications like reducing the risk of exposing sensitive data, we further study privacy-preserving deep learning. We leverage the strengths of federated learning and Transformer models and apply the clustering techniques to address the challenge of privacy protection in large-scale deep learning. By combining these approaches, we can efficiently improve data privacy protection while enabling the application of powerful large-scale deep learning models. Our research demonstrates the potential of clustering methods on social media data, Transformer models for forecasting market trends, and clustering federated learning for privacy-preserving deep learning. Overall, our findings underscore the importance of integrating advanced machine learning techniques and social media data in predicting financial market trends and developing privacy-preserving deep learning systems for real-world applications.
DegreeDoctor of Philosophy
SubjectData mining
Data privacy
Cluster analysis - Data processing
Machine learning
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/352639

 

DC FieldValueLanguage
dc.contributor.authorZhi, Yajing-
dc.contributor.author職亞婧-
dc.date.accessioned2024-12-19T09:26:54Z-
dc.date.available2024-12-19T09:26:54Z-
dc.date.issued2024-
dc.identifier.citationZhi, Y. [職亞婧]. (2024). Clustering for data analysis and privacy preservation in machine learning applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/352639-
dc.description.abstractIn this thesis, we explore the potential of integrating machine learning techniques, such as BERT, Transformer models, and federated learning, for predicting cryptocurrency price trends, specifically Bitcoin, and enabling collaborative and privacy-preserving deep learning for vision tasks. Analyzing almost five years of Reddit data, we investigate the Granger causality link between post volume, post sentiment, and Bitcoin price. Our findings demonstrate that post volume on Reddit better explains price trends than historical prices, emphasizing the importance of considering social media data in financial market predictions. We further evaluate the effectiveness of incorporating social media data and natural language processing methods, such as the Transformer model, in forecasting market trends. Apply different data clustering methods to process the original social media data in a rolling manner. We showcase the potential of the Transformer architecture in predicting Bitcoin price movements by carefully selecting appropriate clustering methods. Ad- Additionally, we explore the impact of incorporating outliers of social media data into the Transformer model to improve prediction accuracy. Given the privacy demand in modern applications like reducing the risk of exposing sensitive data, we further study privacy-preserving deep learning. We leverage the strengths of federated learning and Transformer models and apply the clustering techniques to address the challenge of privacy protection in large-scale deep learning. By combining these approaches, we can efficiently improve data privacy protection while enabling the application of powerful large-scale deep learning models. Our research demonstrates the potential of clustering methods on social media data, Transformer models for forecasting market trends, and clustering federated learning for privacy-preserving deep learning. Overall, our findings underscore the importance of integrating advanced machine learning techniques and social media data in predicting financial market trends and developing privacy-preserving deep learning systems for real-world applications.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshData mining-
dc.subject.lcshData privacy-
dc.subject.lcshCluster analysis - Data processing-
dc.subject.lcshMachine learning-
dc.titleClustering for data analysis and privacy preservation in machine learning applications-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044891406603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats