File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Order-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization
| Title | Order-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization |
|---|---|
| Authors | |
| Issue Date | 2025 |
| Citation | Proceedings of Machine Learning Research, 2025, v. 286, p. 1358-1368 How to Cite? |
| Abstract | This paper addresses the challenge of achieving optimal sample complexity in reinforcement learning for Markov Decision Processes (MDPs) with general policy parameterization and multi-layer neural network critics. Existing approaches either fail to achieve the optimal rate or require impractical assumptions, such as access to knowledge of mixing times or the linearity of the critic. We introduce the Natural Actor-Critic with Data Drop (NAC-DD) algorithm, which integrates Natural Policy Gradient methods with a Data Drop technique to mitigate statistical dependencies inherent in Markovian sampling. NAC-DD achieves an optimal sample complexity of (formula presenetd), marking a significant improvement over the previous stateof-the-art guarantee of Õ˜(1/ϵ3). The algorithm employs a multi-layer neural network critic with differentiable activation functions, aligning with real-world applications where tabular policies and linear critics are insufficient. Our work represents the first to achieve order-optimal sample complexity for actor-critic methods with neural function approximation, continuous state and action spaces, and Markovian sampling. Empirical evaluations on benchmark tasks confirm the theoretical findings, demonstrating the practical efficacy of the proposed method. |
| Persistent Identifier | http://hdl.handle.net/10722/360975 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Ganesh, Swetha | - |
| dc.contributor.author | Chen, Jiayu | - |
| dc.contributor.author | Mondal, Washim Uddin | - |
| dc.contributor.author | Aggarwal, Vaneet | - |
| dc.date.accessioned | 2025-09-16T04:14:05Z | - |
| dc.date.available | 2025-09-16T04:14:05Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Proceedings of Machine Learning Research, 2025, v. 286, p. 1358-1368 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/360975 | - |
| dc.description.abstract | This paper addresses the challenge of achieving optimal sample complexity in reinforcement learning for Markov Decision Processes (MDPs) with general policy parameterization and multi-layer neural network critics. Existing approaches either fail to achieve the optimal rate or require impractical assumptions, such as access to knowledge of mixing times or the linearity of the critic. We introduce the Natural Actor-Critic with Data Drop (NAC-DD) algorithm, which integrates Natural Policy Gradient methods with a Data Drop technique to mitigate statistical dependencies inherent in Markovian sampling. NAC-DD achieves an optimal sample complexity of (formula presenetd), marking a significant improvement over the previous stateof-the-art guarantee of Õ˜(1/ϵ<sup>3</sup>). The algorithm employs a multi-layer neural network critic with differentiable activation functions, aligning with real-world applications where tabular policies and linear critics are insufficient. Our work represents the first to achieve order-optimal sample complexity for actor-critic methods with neural function approximation, continuous state and action spaces, and Markovian sampling. Empirical evaluations on benchmark tasks confirm the theoretical findings, demonstrating the practical efficacy of the proposed method. | - |
| dc.language | eng | - |
| dc.relation.ispartof | Proceedings of Machine Learning Research | - |
| dc.title | Order-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization | - |
| dc.type | Conference_Paper | - |
| dc.description.nature | link_to_subscribed_fulltext | - |
| dc.identifier.scopus | eid_2-s2.0-105014722535 | - |
| dc.identifier.volume | 286 | - |
| dc.identifier.spage | 1358 | - |
| dc.identifier.epage | 1368 | - |
| dc.identifier.eissn | 2640-3498 | - |
