File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Risk-aware Q-learning for Markov decision processes

TitleRisk-aware Q-learning for Markov decision processes
Authors
Issue Date2018
Citation
2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12-15 December 2017. In Conference Proceedings, 2018, p. 4928-4933 How to Cite?
AbstractWe are interested in developing reinforcement learning algorithm to tackle risk-aware sequential decision making problems. The model we investigate is a discounted infinite-horizon Markov decision processes with finite state and action spaces. Our algorithm is based on estimating a general minimax function with stochastic approximation, and we show that several risk measures fall within this form. We derive finite-time bounds for this algorithm by combining stochastic approximation with the theories of risk-aware dynamic programming. Finally, we present extensions to several variations of risk measures.
Persistent Identifierhttp://hdl.handle.net/10722/308924
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorHuang, Wenjie-
dc.contributor.authorHaskell, William B.-
dc.date.accessioned2021-12-08T07:50:25Z-
dc.date.available2021-12-08T07:50:25Z-
dc.date.issued2018-
dc.identifier.citation2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, Australia, 12-15 December 2017. In Conference Proceedings, 2018, p. 4928-4933-
dc.identifier.urihttp://hdl.handle.net/10722/308924-
dc.description.abstractWe are interested in developing reinforcement learning algorithm to tackle risk-aware sequential decision making problems. The model we investigate is a discounted infinite-horizon Markov decision processes with finite state and action spaces. Our algorithm is based on estimating a general minimax function with stochastic approximation, and we show that several risk measures fall within this form. We derive finite-time bounds for this algorithm by combining stochastic approximation with the theories of risk-aware dynamic programming. Finally, we present extensions to several variations of risk measures.-
dc.languageeng-
dc.relation.ispartof2017 IEEE 56th Annual Conference on Decision and Control (CDC)-
dc.titleRisk-aware Q-learning for Markov decision processes-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CDC.2017.8264388-
dc.identifier.scopuseid_2-s2.0-85046164297-
dc.identifier.spage4928-
dc.identifier.epage4933-
dc.identifier.isiWOS:000424696904120-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats