File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/j.ress.2025.111587
- Scopus: eid_2-s2.0-105013846326
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: A multi-agent reinforcement learning (MARL) framework for designing an optimal state-specific hybrid maintenance policy for a series k-out-of-n load-sharing system
| Title | A multi-agent reinforcement learning (MARL) framework for designing an optimal state-specific hybrid maintenance policy for a series k-out-of-n load-sharing system |
|---|---|
| Authors | |
| Keywords | Markov decision process Multi-agent reinforcement learning Series k-out-of-n load-sharing system State-specific hybrid maintenance policy |
| Issue Date | 1-Jan-2026 |
| Publisher | Elsevier |
| Citation | Reliability Engineering & System Safety, 2026, v. 265 How to Cite? |
| Abstract | The series k-out-of-n: G load-sharing structure is widely adopted in engineering. During their operations, system components are subject to deterioration that causes system failures and shutdowns. Although maintenance reduces system failure-associated costs, it also requires system shutdown and incurs considerable costs. This calls upon a maintenance policy that minimizes the overall long-term cost rate. When the components have continuous and load-dependent deterioration processes and the maintenance duration is non-negligible, the task becomes especially challenging. In this paper, we propose a Markov decision process (MDP)-based multi-agent reinforcement learning (MARL) framework to obtain an optimal state-specific hybrid maintenance policy that determines the maintenance timing and levels for all components holistically. First, we define the policy that dictates whether each component undergoes imperfect repair or replacement at periodic decision epochs. Second, we establish an MDP-based multi-agent framework to quantify the system's cost rate by defining the state and action spaces, modeling the stochastic transitions of components’ dependent deterioration processes, and formulating a well-calibrated penalty function. Third, we customize a MARL algorithm which leverages neural networks to handle the large state space and integrates the Branching Dueling Network structure to decompose the high-dimensional action space, thereby improving the scalability. A heuristic-enhanced penalty function is designed to avoid suboptimal policies. A power plant case study demonstrates the effectiveness of the proposed policy and underscores the importance of accounting for maintenance duration in policy design. |
| Persistent Identifier | http://hdl.handle.net/10722/362732 |
| ISSN | 2023 Impact Factor: 9.4 2023 SCImago Journal Rankings: 2.028 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Zhao, Sangqi | - |
| dc.contributor.author | Wei, Yian | - |
| dc.contributor.author | Li, Yang | - |
| dc.contributor.author | Cheng, Yao | - |
| dc.date.accessioned | 2025-09-27T00:35:28Z | - |
| dc.date.available | 2025-09-27T00:35:28Z | - |
| dc.date.issued | 2026-01-01 | - |
| dc.identifier.citation | Reliability Engineering & System Safety, 2026, v. 265 | - |
| dc.identifier.issn | 0951-8320 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/362732 | - |
| dc.description.abstract | The series k-out-of-n: G load-sharing structure is widely adopted in engineering. During their operations, system components are subject to deterioration that causes system failures and shutdowns. Although maintenance reduces system failure-associated costs, it also requires system shutdown and incurs considerable costs. This calls upon a maintenance policy that minimizes the overall long-term cost rate. When the components have continuous and load-dependent deterioration processes and the maintenance duration is non-negligible, the task becomes especially challenging. In this paper, we propose a Markov decision process (MDP)-based multi-agent reinforcement learning (MARL) framework to obtain an optimal state-specific hybrid maintenance policy that determines the maintenance timing and levels for all components holistically. First, we define the policy that dictates whether each component undergoes imperfect repair or replacement at periodic decision epochs. Second, we establish an MDP-based multi-agent framework to quantify the system's cost rate by defining the state and action spaces, modeling the stochastic transitions of components’ dependent deterioration processes, and formulating a well-calibrated penalty function. Third, we customize a MARL algorithm which leverages neural networks to handle the large state space and integrates the Branching Dueling Network structure to decompose the high-dimensional action space, thereby improving the scalability. A heuristic-enhanced penalty function is designed to avoid suboptimal policies. A power plant case study demonstrates the effectiveness of the proposed policy and underscores the importance of accounting for maintenance duration in policy design. | - |
| dc.language | eng | - |
| dc.publisher | Elsevier | - |
| dc.relation.ispartof | Reliability Engineering & System Safety | - |
| dc.subject | Markov decision process | - |
| dc.subject | Multi-agent reinforcement learning | - |
| dc.subject | Series k-out-of-n load-sharing system | - |
| dc.subject | State-specific hybrid maintenance policy | - |
| dc.title | A multi-agent reinforcement learning (MARL) framework for designing an optimal state-specific hybrid maintenance policy for a series k-out-of-n load-sharing system | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1016/j.ress.2025.111587 | - |
| dc.identifier.scopus | eid_2-s2.0-105013846326 | - |
| dc.identifier.volume | 265 | - |
| dc.identifier.issnl | 0951-8320 | - |
