Adaptive resource allocation for cloud computing and federated learning

Du, Bingqian; 杜冰倩

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Adaptive resource allocation for cloud computing and federated learning

Title	Adaptive resource allocation for cloud computing and federated learning
Authors	Du, Bingqian 杜冰倩
Advisors	Advisor(s):Wu, C
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Du, B. [杜冰倩]. (2022). Adaptive resource allocation for cloud computing and federated learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	The efficiency of large computing systems, such as cloud computing platform and distributed learning system, heavily relies on the quality of resource allocation strategies. The advent of neural networks and deep learning brings in performance improvement for all kinds of problems, which motivates us to tackle resource allocation problems in computing systems from the perspective of deep learning. We first study the online resource allocation and pricing problem from the deep learning angle to seek the possibility of designing online resource allocation algorithms and understanding worst cases from scratch. We consider single-type non-recycled resource allocation and pricing problem and utilise adversarial learning to approach worst-case based competitive ratio/gap. Specifically, we leverage two neural networks (NNs) as online algorithm and adversary respectively, and let them play a zero-sum game. We propose a single-round gradient descent method to break the complex dependency of sequence to ensure better convergence. We show the ability to converge to NE and a better competitive ratio of our method both theoretically and empirically. Next, We study VM allocation and pricing problem for cloud computing platform. Traditional methods are based on careful problem formulation, which is suboptimal in understanding the highly complex dynamics of cloud computing platforms. Instead, we resort to a deep reinforcement learning method to better capture the dynamics in order to build the connection between optimal policy and the states of system. We carefully design states, actions, and rewards in Deep Reinforcement Learning (DRL) to combine time series prediction and Markovian RL. Evaluation based on real-world traces shows that our method outperforms existing white-box methods in both profits and accepted user numbers. We further study the communication and privacy resource allocation problem in federated learning (FL) system. The bandwidth in FL is a scarce resource while the neighbour feature transmission in geo-distributed graph training would consume a large number of communication resources and dominate the whole training process. We propose conducting neighbour sampling periodically by trading off the relationship between convergence error, runtime, and neighbour sampling frequency. We derive the optimal sampling interval based on this relationship when there is no communication constraint and propose an online algorithm for constrained case. The experiment shows that the sampling interval found by our method achieves the best trade-off between convergence error and actual runtime. We next consider the privacy budget allocation problem for FL framework. Differential privacy (DP) was introduced to FL to protect sensitive data. However, the uniform gradient clipping and the noise addition of DP mechanism would cause significant skewness in performance degradation/unfairness among clients. We propose to set adaptive clipping values for different clients by analysing the effect of DP to gradients of individual client and deriving the relationship between loss variance and clipping values. The adaptively updated clipping values determine the variance of noise and privacy budget allocated for each model aggregation. Empirical results validate the effectiveness of our method for improving the fairness of FL.
Degree	Doctor of Philosophy
Subject	Machine learning Cloud computing
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/318311

DC Field	Value	Language
dc.contributor.advisor	Wu, C	-
dc.contributor.author	Du, Bingqian	-
dc.contributor.author	杜冰倩	-
dc.date.accessioned	2022-10-10T08:18:40Z	-
dc.date.available	2022-10-10T08:18:40Z	-
dc.date.issued	2022	-
dc.identifier.citation	Du, B. [杜冰倩]. (2022). Adaptive resource allocation for cloud computing and federated learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/318311	-
dc.description.abstract	The efficiency of large computing systems, such as cloud computing platform and distributed learning system, heavily relies on the quality of resource allocation strategies. The advent of neural networks and deep learning brings in performance improvement for all kinds of problems, which motivates us to tackle resource allocation problems in computing systems from the perspective of deep learning. We first study the online resource allocation and pricing problem from the deep learning angle to seek the possibility of designing online resource allocation algorithms and understanding worst cases from scratch. We consider single-type non-recycled resource allocation and pricing problem and utilise adversarial learning to approach worst-case based competitive ratio/gap. Specifically, we leverage two neural networks (NNs) as online algorithm and adversary respectively, and let them play a zero-sum game. We propose a single-round gradient descent method to break the complex dependency of sequence to ensure better convergence. We show the ability to converge to NE and a better competitive ratio of our method both theoretically and empirically. Next, We study VM allocation and pricing problem for cloud computing platform. Traditional methods are based on careful problem formulation, which is suboptimal in understanding the highly complex dynamics of cloud computing platforms. Instead, we resort to a deep reinforcement learning method to better capture the dynamics in order to build the connection between optimal policy and the states of system. We carefully design states, actions, and rewards in Deep Reinforcement Learning (DRL) to combine time series prediction and Markovian RL. Evaluation based on real-world traces shows that our method outperforms existing white-box methods in both profits and accepted user numbers. We further study the communication and privacy resource allocation problem in federated learning (FL) system. The bandwidth in FL is a scarce resource while the neighbour feature transmission in geo-distributed graph training would consume a large number of communication resources and dominate the whole training process. We propose conducting neighbour sampling periodically by trading off the relationship between convergence error, runtime, and neighbour sampling frequency. We derive the optimal sampling interval based on this relationship when there is no communication constraint and propose an online algorithm for constrained case. The experiment shows that the sampling interval found by our method achieves the best trade-off between convergence error and actual runtime. We next consider the privacy budget allocation problem for FL framework. Differential privacy (DP) was introduced to FL to protect sensitive data. However, the uniform gradient clipping and the noise addition of DP mechanism would cause significant skewness in performance degradation/unfairness among clients. We propose to set adaptive clipping values for different clients by analysing the effect of DP to gradients of individual client and deriving the relationship between loss variance and clipping values. The adaptively updated clipping values determine the variance of noise and privacy budget allocated for each model aggregation. Empirical results validate the effectiveness of our method for improving the fairness of FL.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Machine learning	-
dc.subject.lcsh	Cloud computing	-
dc.title	Adaptive resource allocation for cloud computing and federated learning	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2022	-
dc.identifier.mmsid	991044600193103414	-

File Download

Supplementary

postgraduate thesis: Adaptive resource allocation for cloud computing and federated learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats