TR2023-143
Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning
-
- "Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine, Eds., December 2023, pp. 1659-1680.BibTeX TR2023-143 PDF
- @inproceedings{Queeney2023dec,
- author = {{Queeney, James and Benosman, Mouhacine}},
- title = {Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
- year = 2023,
- editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
- pages = {1659--1680},
- month = dec,
- publisher = {Curran Associates, Inc.},
- url = {https://www.merl.com/publications/TR2023-143}
- }
,
- "Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine, Eds., December 2023, pp. 1659-1680.
-
MERL Contact:
-
Research Areas:
Abstract:
Many real-world domains require safe decision making in uncertain environments. In this work, we introduce a deep reinforcement learning framework for approaching this important problem. We consider a distribution over transition models, and apply a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures. We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems. Unlike existing approaches to robustness in deep reinforcement learning, however, our formulation does not involve minimax optimization. This leads to an efficient, model-free implementation of our approach that only requires standard data collection from a single training environment. In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.