TapFinger: Task Placement and Fine-Grained Resource Allocation for Edge Machine Learning

Li, Yihong; Zeng, Tianyu; Zhang, Xiaoxi; Duan, Jingpu; Wu, Chuan

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/INFOCOM53939.2023.10229031

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: TapFinger: Task Placement and Fine-Grained Resource Allocation for Edge Machine Learning

Title	TapFinger: Task Placement and Fine-Grained Resource Allocation for Edge Machine Learning
Authors	Li, Yihong Zeng, Tianyu Zhang, Xiaoxi Duan, Jingpu Wu, Chuan
Issue Date	17-May-2023
Abstract	Machine learning (ML) tasks are one of the major workloads in today’s edge computing networks. Existing edge- cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources flexibly for ML task performance optimization. This paper proposes TapFinger, a distributed scheduler that minimizes the total completion time of ML tasks in a multi-cluster edge network, through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks’ uncertain resource sensitivity and enable distributed online scheduling, we adopt multi-agent reinforcement learning (MARL), and propose sev- eral techniques to make it efficient for our ML-task resource allocation. First, TapFinger uses a heterogeneous graph attention network as the MARL backbone to abstract inter-related state features into more learnable environmental patterns. Second, the actor network is augmented through a tailored task selection phase, which decomposes the actions and encodes the opti- mization constraints. Third, to mitigate decision conflicts among agents, we novelly combine Bayes’ theorem and masking schemes to facilitate our MARL model training. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 28.6% reduction in the average task completion time and improve resource efficiency as compared to state-of-the- art resource schedulers.
Persistent Identifier	http://hdl.handle.net/10722/333889

DC Field	Value	Language
dc.contributor.author	Li, Yihong	-
dc.contributor.author	Zeng, Tianyu	-
dc.contributor.author	Zhang, Xiaoxi	-
dc.contributor.author	Duan, Jingpu	-
dc.contributor.author	Wu, Chuan	-
dc.date.accessioned	2023-10-06T08:39:55Z	-
dc.date.available	2023-10-06T08:39:55Z	-
dc.date.issued	2023-05-17	-
dc.identifier.uri	http://hdl.handle.net/10722/333889	-
dc.description.abstract	<p>Machine learning (ML) tasks are one of the major workloads in today’s edge computing networks. Existing edge- cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources flexibly for ML task performance optimization. This paper proposes TapFinger, a distributed scheduler that minimizes the total completion time of ML tasks in a multi-cluster edge network, through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks’ uncertain resource sensitivity and enable distributed online scheduling, we adopt multi-agent reinforcement learning (MARL), and propose sev- eral techniques to make it efficient for our ML-task resource allocation. First, TapFinger uses a heterogeneous graph attention network as the MARL backbone to abstract inter-related state features into more learnable environmental patterns. Second, the actor network is augmented through a tailored task selection phase, which decomposes the actions and encodes the opti- mization constraints. Third, to mitigate decision conflicts among agents, we novelly combine Bayes’ theorem and masking schemes to facilitate our MARL model training. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 28.6% reduction in the average task completion time and improve resource efficiency as compared to state-of-the- art resource schedulers.</p>	-
dc.language	eng	-
dc.relation.ispartof	IEEE International Conference on Computer Communications (INFOCOM) 2023 (17/05/2023-20/05/2023, New York)	-
dc.title	TapFinger: Task Placement and Fine-Grained Resource Allocation for Edge Machine Learning	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.1109/INFOCOM53939.2023.10229031	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: TapFinger: Task Placement and Fine-Grained Resource Allocation for Edge Machine Learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats