File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Conference Paper: Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes

TitlePolicy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes
Authors
Issue Date7-Feb-2023
PublisherAAAI Press
Abstract

We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure. We propose a policy-based primal-dual algorithm that updates the primal variable via policy gradient ascent and updates the dual variable via projected sub-gradient descent. Despite the loss of additivity structure and the nonconvex nature, we establish the global convergence of the proposed algorithm by leveraging a hidden convexity in the problem, and prove the O (T-1/3) convergence rate in terms of both optimality gap and constraint violation. When the objective is strongly concave in the occupancy measure, we prove an improved convergence rate of O (T-1/2). By introducing a pessimistic term to the constraint, we further show that a zero constraint violation can be achieved while preserving the same convergence rate for the optimality gap. This work is the first one in the literature that establishes non-asymptotic convergence guarantees for policy-based primal-dual methods for solving infinite-horizon discounted convex CMDPs.


Persistent Identifierhttp://hdl.handle.net/10722/336533
ISBN

 

DC FieldValueLanguage
dc.contributor.authorYing, D-
dc.contributor.authorGuo, MA-
dc.contributor.authorDing, Y-
dc.contributor.authorLavaei, J-
dc.contributor.authorShen, ZJ-
dc.date.accessioned2024-02-16T03:57:32Z-
dc.date.available2024-02-16T03:57:32Z-
dc.date.issued2023-02-07-
dc.identifier.isbn9781577358800-
dc.identifier.urihttp://hdl.handle.net/10722/336533-
dc.description.abstract<p>We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure. We propose a policy-based primal-dual algorithm that updates the primal variable via policy gradient ascent and updates the dual variable via projected sub-gradient descent. Despite the loss of additivity structure and the nonconvex nature, we establish the global convergence of the proposed algorithm by leveraging a hidden convexity in the problem, and prove the O (T-1/3) convergence rate in terms of both optimality gap and constraint violation. When the objective is strongly concave in the occupancy measure, we prove an improved convergence rate of O (T-1/2). By introducing a pessimistic term to the constraint, we further show that a zero constraint violation can be achieved while preserving the same convergence rate for the optimality gap. This work is the first one in the literature that establishes non-asymptotic convergence guarantees for policy-based primal-dual methods for solving infinite-horizon discounted convex CMDPs.</p>-
dc.languageeng-
dc.publisherAAAI Press-
dc.relation.ispartofThe 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (07/02/2023-14/02/2023, , , Washington, USA)-
dc.titlePolicy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes-
dc.typeConference_Paper-
dc.identifier.scopuseid_2-s2.0-85168250561-
dc.identifier.volume37-
dc.identifier.issue37-
dc.identifier.spage10963-
dc.identifier.epage10971-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats