File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

TitlePolicy-based Primal-Dual Methods for Concave CMDP with Variance Reduction
Authors
Keywordsmachine learning
markov decision processes
reinforcement learning
Issue Date29-Aug-2025
PublisherAI Access Foundation
Citation
Journal of Artificial Intelligence Research, 2025, v. 83 How to Cite?
Abstract

We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an O−1/3) convergence rate for both the average optimality gap and constraint violation, which further improves to O−1/2) under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an (Formula Presented) sample complexity for-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate our methods through numerical experiments. 


Persistent Identifierhttp://hdl.handle.net/10722/368249
ISSN
2023 Impact Factor: 4.5
2023 SCImago Journal Rankings: 1.614

 

DC FieldValueLanguage
dc.contributor.authorYing, Donghao-
dc.contributor.authorGuo, Mengzi Amy-
dc.contributor.authorLee, Hyunin-
dc.contributor.authorDing, Yuhao-
dc.contributor.authorLavaei, Javad-
dc.contributor.authorShen, Zuo Jun Max-
dc.date.accessioned2025-12-24T00:37:05Z-
dc.date.available2025-12-24T00:37:05Z-
dc.date.issued2025-08-29-
dc.identifier.citationJournal of Artificial Intelligence Research, 2025, v. 83-
dc.identifier.issn1076-9757-
dc.identifier.urihttp://hdl.handle.net/10722/368249-
dc.description.abstract<p>We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an O<sup>−1/3</sup>) convergence rate for both the average optimality gap and constraint violation, which further improves to O<sup>−1/2</sup>) under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an (Formula Presented) sample complexity for-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate our methods through numerical experiments. <br></p>-
dc.languageeng-
dc.publisherAI Access Foundation-
dc.relation.ispartofJournal of Artificial Intelligence Research-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectmachine learning-
dc.subjectmarkov decision processes-
dc.subjectreinforcement learning-
dc.titlePolicy-based Primal-Dual Methods for Concave CMDP with Variance Reduction -
dc.typeArticle-
dc.identifier.doi10.1613/jair.1.18129-
dc.identifier.scopuseid_2-s2.0-105018737537-
dc.identifier.volume83-
dc.identifier.eissn1943-5037-
dc.identifier.issnl1076-9757-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats