File Download
Supplementary

postgraduate thesis: Adaptive resource allocation for cloud computing and federated learning

TitleAdaptive resource allocation for cloud computing and federated learning
Authors
Advisors
Advisor(s):Wu, C
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Du, B. [杜冰倩]. (2022). Adaptive resource allocation for cloud computing and federated learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe efficiency of large computing systems, such as cloud computing platform and distributed learning system, heavily relies on the quality of resource allocation strategies. The advent of neural networks and deep learning brings in performance improvement for all kinds of problems, which motivates us to tackle resource allocation problems in computing systems from the perspective of deep learning. We first study the online resource allocation and pricing problem from the deep learning angle to seek the possibility of designing online resource allocation algorithms and understanding worst cases from scratch. We consider single-type non-recycled resource allocation and pricing problem and utilise adversarial learning to approach worst-case based competitive ratio/gap. Specifically, we leverage two neural networks (NNs) as online algorithm and adversary respectively, and let them play a zero-sum game. We propose a single-round gradient descent method to break the complex dependency of sequence to ensure better convergence. We show the ability to converge to NE and a better competitive ratio of our method both theoretically and empirically. Next, We study VM allocation and pricing problem for cloud computing platform. Traditional methods are based on careful problem formulation, which is suboptimal in understanding the highly complex dynamics of cloud computing platforms. Instead, we resort to a deep reinforcement learning method to better capture the dynamics in order to build the connection between optimal policy and the states of system. We carefully design states, actions, and rewards in Deep Reinforcement Learning (DRL) to combine time series prediction and Markovian RL. Evaluation based on real-world traces shows that our method outperforms existing white-box methods in both profits and accepted user numbers. We further study the communication and privacy resource allocation problem in federated learning (FL) system. The bandwidth in FL is a scarce resource while the neighbour feature transmission in geo-distributed graph training would consume a large number of communication resources and dominate the whole training process. We propose conducting neighbour sampling periodically by trading off the relationship between convergence error, runtime, and neighbour sampling frequency. We derive the optimal sampling interval based on this relationship when there is no communication constraint and propose an online algorithm for constrained case. The experiment shows that the sampling interval found by our method achieves the best trade-off between convergence error and actual runtime. We next consider the privacy budget allocation problem for FL framework. Differential privacy (DP) was introduced to FL to protect sensitive data. However, the uniform gradient clipping and the noise addition of DP mechanism would cause significant skewness in performance degradation/unfairness among clients. We propose to set adaptive clipping values for different clients by analysing the effect of DP to gradients of individual client and deriving the relationship between loss variance and clipping values. The adaptively updated clipping values determine the variance of noise and privacy budget allocated for each model aggregation. Empirical results validate the effectiveness of our method for improving the fairness of FL.
DegreeDoctor of Philosophy
SubjectMachine learning
Cloud computing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/318311

 

DC FieldValueLanguage
dc.contributor.advisorWu, C-
dc.contributor.authorDu, Bingqian-
dc.contributor.author杜冰倩-
dc.date.accessioned2022-10-10T08:18:40Z-
dc.date.available2022-10-10T08:18:40Z-
dc.date.issued2022-
dc.identifier.citationDu, B. [杜冰倩]. (2022). Adaptive resource allocation for cloud computing and federated learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/318311-
dc.description.abstractThe efficiency of large computing systems, such as cloud computing platform and distributed learning system, heavily relies on the quality of resource allocation strategies. The advent of neural networks and deep learning brings in performance improvement for all kinds of problems, which motivates us to tackle resource allocation problems in computing systems from the perspective of deep learning. We first study the online resource allocation and pricing problem from the deep learning angle to seek the possibility of designing online resource allocation algorithms and understanding worst cases from scratch. We consider single-type non-recycled resource allocation and pricing problem and utilise adversarial learning to approach worst-case based competitive ratio/gap. Specifically, we leverage two neural networks (NNs) as online algorithm and adversary respectively, and let them play a zero-sum game. We propose a single-round gradient descent method to break the complex dependency of sequence to ensure better convergence. We show the ability to converge to NE and a better competitive ratio of our method both theoretically and empirically. Next, We study VM allocation and pricing problem for cloud computing platform. Traditional methods are based on careful problem formulation, which is suboptimal in understanding the highly complex dynamics of cloud computing platforms. Instead, we resort to a deep reinforcement learning method to better capture the dynamics in order to build the connection between optimal policy and the states of system. We carefully design states, actions, and rewards in Deep Reinforcement Learning (DRL) to combine time series prediction and Markovian RL. Evaluation based on real-world traces shows that our method outperforms existing white-box methods in both profits and accepted user numbers. We further study the communication and privacy resource allocation problem in federated learning (FL) system. The bandwidth in FL is a scarce resource while the neighbour feature transmission in geo-distributed graph training would consume a large number of communication resources and dominate the whole training process. We propose conducting neighbour sampling periodically by trading off the relationship between convergence error, runtime, and neighbour sampling frequency. We derive the optimal sampling interval based on this relationship when there is no communication constraint and propose an online algorithm for constrained case. The experiment shows that the sampling interval found by our method achieves the best trade-off between convergence error and actual runtime. We next consider the privacy budget allocation problem for FL framework. Differential privacy (DP) was introduced to FL to protect sensitive data. However, the uniform gradient clipping and the noise addition of DP mechanism would cause significant skewness in performance degradation/unfairness among clients. We propose to set adaptive clipping values for different clients by analysing the effect of DP to gradients of individual client and deriving the relationship between loss variance and clipping values. The adaptively updated clipping values determine the variance of noise and privacy budget allocated for each model aggregation. Empirical results validate the effectiveness of our method for improving the fairness of FL.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMachine learning-
dc.subject.lcshCloud computing-
dc.titleAdaptive resource allocation for cloud computing and federated learning-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044600193103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats