File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Order-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization

TitleOrder-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization
Authors
Issue Date2025
Citation
Proceedings of Machine Learning Research, 2025, v. 286, p. 1358-1368 How to Cite?
AbstractThis paper addresses the challenge of achieving optimal sample complexity in reinforcement learning for Markov Decision Processes (MDPs) with general policy parameterization and multi-layer neural network critics. Existing approaches either fail to achieve the optimal rate or require impractical assumptions, such as access to knowledge of mixing times or the linearity of the critic. We introduce the Natural Actor-Critic with Data Drop (NAC-DD) algorithm, which integrates Natural Policy Gradient methods with a Data Drop technique to mitigate statistical dependencies inherent in Markovian sampling. NAC-DD achieves an optimal sample complexity of (formula presenetd), marking a significant improvement over the previous stateof-the-art guarantee of Õ˜(1/ϵ3). The algorithm employs a multi-layer neural network critic with differentiable activation functions, aligning with real-world applications where tabular policies and linear critics are insufficient. Our work represents the first to achieve order-optimal sample complexity for actor-critic methods with neural function approximation, continuous state and action spaces, and Markovian sampling. Empirical evaluations on benchmark tasks confirm the theoretical findings, demonstrating the practical efficacy of the proposed method.
Persistent Identifierhttp://hdl.handle.net/10722/360975

 

DC FieldValueLanguage
dc.contributor.authorGanesh, Swetha-
dc.contributor.authorChen, Jiayu-
dc.contributor.authorMondal, Washim Uddin-
dc.contributor.authorAggarwal, Vaneet-
dc.date.accessioned2025-09-16T04:14:05Z-
dc.date.available2025-09-16T04:14:05Z-
dc.date.issued2025-
dc.identifier.citationProceedings of Machine Learning Research, 2025, v. 286, p. 1358-1368-
dc.identifier.urihttp://hdl.handle.net/10722/360975-
dc.description.abstractThis paper addresses the challenge of achieving optimal sample complexity in reinforcement learning for Markov Decision Processes (MDPs) with general policy parameterization and multi-layer neural network critics. Existing approaches either fail to achieve the optimal rate or require impractical assumptions, such as access to knowledge of mixing times or the linearity of the critic. We introduce the Natural Actor-Critic with Data Drop (NAC-DD) algorithm, which integrates Natural Policy Gradient methods with a Data Drop technique to mitigate statistical dependencies inherent in Markovian sampling. NAC-DD achieves an optimal sample complexity of (formula presenetd), marking a significant improvement over the previous stateof-the-art guarantee of Õ˜(1/ϵ<sup>3</sup>). The algorithm employs a multi-layer neural network critic with differentiable activation functions, aligning with real-world applications where tabular policies and linear critics are insufficient. Our work represents the first to achieve order-optimal sample complexity for actor-critic methods with neural function approximation, continuous state and action spaces, and Markovian sampling. Empirical evaluations on benchmark tasks confirm the theoretical findings, demonstrating the practical efficacy of the proposed method.-
dc.languageeng-
dc.relation.ispartofProceedings of Machine Learning Research-
dc.titleOrder-Optimal Global Convergence for Actor-Critic with General Policy and Neural Critic Parametrization-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-105014722535-
dc.identifier.volume286-
dc.identifier.spage1358-
dc.identifier.epage1368-
dc.identifier.eissn2640-3498-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats