File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TNSM.2024.3484213
- Scopus: eid_2-s2.0-105001078273
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Learning-Based Two-Tiered Online Optimization of Region-Wide Datacenter Resource Allocation
| Title | Learning-Based Two-Tiered Online Optimization of Region-Wide Datacenter Resource Allocation |
|---|---|
| Authors | |
| Keywords | capacity reservation Cloud computing deep reinforcement learning explainable reinforcement learning |
| Issue Date | 2025 |
| Citation | IEEE Transactions on Network and Service Management, 2025, v. 22, n. 1, p. 572-581 How to Cite? |
| Abstract | Online optimization of resource management for large-scale data centers and infrastructures to meet dynamic capacity reservation demands and various practical constraints (e.g., feasibility and robustness) is a very challenging problem. Mixed Integer Programming (MIP) approaches suffer from recognized limitations in such a dynamic environment, while learning-based approaches may face with prohibitively large state/action spaces. To this end, this paper presents a novel two-tiered online optimization to enable a learning-based Resource Allowance System (RAS). To solve optimal server-to-reservation assignment in RAS in an online fashion, the proposed solution leverages a reinforcement learning (RL) agent to make high-level decisions, e.g., how much resource to select from the Main Switch Boards (MSBs), and then a low-level Mixed Integer Linear Programming (MILP) solver to generate the local server-to-reservation mapping, conditioned on the RL decisions. We take into account fault tolerance, server movement minimization, and network affinity requirements and apply the proposed solution to large-scale RAS problems. To provide interpretability, we further train a decision tree model to explain the learned policies and to prune unreasonable corner cases at the low-level MILP solver, resulting in further performance improvement. Extensive evaluations show that our two-tiered solution outperforms baselines such as pure MIP solver by over 15% while delivering 100× speedup in computation. |
| Persistent Identifier | http://hdl.handle.net/10722/360934 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Chen, Chang Lin | - |
| dc.contributor.author | Zhou, Hanhan | - |
| dc.contributor.author | Chen, Jiayu | - |
| dc.contributor.author | Pedramfar, Mohammad | - |
| dc.contributor.author | Lan, Tian | - |
| dc.contributor.author | Zhu, Zheqing | - |
| dc.contributor.author | Zhou, Chi | - |
| dc.contributor.author | Mauri Ruiz, Pol | - |
| dc.contributor.author | Kumar, Neeraj | - |
| dc.contributor.author | Dong, Hongbo | - |
| dc.contributor.author | Aggarwal, Vaneet | - |
| dc.date.accessioned | 2025-09-16T04:13:30Z | - |
| dc.date.available | 2025-09-16T04:13:30Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | IEEE Transactions on Network and Service Management, 2025, v. 22, n. 1, p. 572-581 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/360934 | - |
| dc.description.abstract | Online optimization of resource management for large-scale data centers and infrastructures to meet dynamic capacity reservation demands and various practical constraints (e.g., feasibility and robustness) is a very challenging problem. Mixed Integer Programming (MIP) approaches suffer from recognized limitations in such a dynamic environment, while learning-based approaches may face with prohibitively large state/action spaces. To this end, this paper presents a novel two-tiered online optimization to enable a learning-based Resource Allowance System (RAS). To solve optimal server-to-reservation assignment in RAS in an online fashion, the proposed solution leverages a reinforcement learning (RL) agent to make high-level decisions, e.g., how much resource to select from the Main Switch Boards (MSBs), and then a low-level Mixed Integer Linear Programming (MILP) solver to generate the local server-to-reservation mapping, conditioned on the RL decisions. We take into account fault tolerance, server movement minimization, and network affinity requirements and apply the proposed solution to large-scale RAS problems. To provide interpretability, we further train a decision tree model to explain the learned policies and to prune unreasonable corner cases at the low-level MILP solver, resulting in further performance improvement. Extensive evaluations show that our two-tiered solution outperforms baselines such as pure MIP solver by over 15% while delivering 100× speedup in computation. | - |
| dc.language | eng | - |
| dc.relation.ispartof | IEEE Transactions on Network and Service Management | - |
| dc.subject | capacity reservation | - |
| dc.subject | Cloud computing | - |
| dc.subject | deep reinforcement learning | - |
| dc.subject | explainable reinforcement learning | - |
| dc.title | Learning-Based Two-Tiered Online Optimization of Region-Wide Datacenter Resource Allocation | - |
| dc.type | Article | - |
| dc.description.nature | link_to_subscribed_fulltext | - |
| dc.identifier.doi | 10.1109/TNSM.2024.3484213 | - |
| dc.identifier.scopus | eid_2-s2.0-105001078273 | - |
| dc.identifier.volume | 22 | - |
| dc.identifier.issue | 1 | - |
| dc.identifier.spage | 572 | - |
| dc.identifier.epage | 581 | - |
| dc.identifier.eissn | 1932-4537 | - |
