File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Stochastic Approximation for Risk-Aware Markov Decision Processes

TitleStochastic Approximation for Risk-Aware Markov Decision Processes
Authors
KeywordsMarkov decision processes (MDPs)
risk measure
saddle point
stochastic approximation
Q-learning
Issue Date2021
PublisherInstitute of Electrical and Electronics Engineers. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9
Citation
IEEE Transactions on Automatic Control, 2021, v. 66 n. 3, p. 1314-1320 How to Cite?
AbstractWe develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q- learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance ε > 0 for optimal Q-value estimation gap and learning rate k ∈ (1/2, 1], the overall convergence rate of our algorithm is Ω((ln(1/δε)/ε 2 ) 1/k + (ln(1/ε)) 1/(1-k) ) with probability at least 1 - δ.
Persistent Identifierhttp://hdl.handle.net/10722/305821
ISSN
2023 Impact Factor: 6.2
2023 SCImago Journal Rankings: 4.501
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorHuang, W-
dc.contributor.authorHaskell, WB-
dc.date.accessioned2021-10-20T10:14:48Z-
dc.date.available2021-10-20T10:14:48Z-
dc.date.issued2021-
dc.identifier.citationIEEE Transactions on Automatic Control, 2021, v. 66 n. 3, p. 1314-1320-
dc.identifier.issn0018-9286-
dc.identifier.urihttp://hdl.handle.net/10722/305821-
dc.description.abstractWe develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q- learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance ε > 0 for optimal Q-value estimation gap and learning rate k ∈ (1/2, 1], the overall convergence rate of our algorithm is Ω((ln(1/δε)/ε 2 ) 1/k + (ln(1/ε)) 1/(1-k) ) with probability at least 1 - δ.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9-
dc.relation.ispartofIEEE Transactions on Automatic Control-
dc.subjectMarkov decision processes (MDPs)-
dc.subjectrisk measure-
dc.subjectsaddle point-
dc.subjectstochastic approximation-
dc.subjectQ-learning-
dc.titleStochastic Approximation for Risk-Aware Markov Decision Processes-
dc.typeArticle-
dc.identifier.emailHuang, W: huangwj@hku.hk-
dc.identifier.authorityHuang, W=rp02898-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/TAC.2020.2989702-
dc.identifier.scopuseid_2-s2.0-85102065067-
dc.identifier.hkuros327215-
dc.identifier.volume66-
dc.identifier.issue3-
dc.identifier.spage1314-
dc.identifier.epage1320-
dc.identifier.isiWOS:000623420100033-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats