CHAI paper submitted to NeurIPS

24 May 2019

Adam Gleave, Michael Dennis, Neel Kant, Cody Wild, Sergey Levine, and Stuart Russell submitted their paper Adversarial Policies: Attacking Deep Reinforcement Learning to NeurIPS 2019. The abstract can be found below:

Deep reinforcement learning (RL) has achieved great success in recent years and is one of the more likely routes to higly capable AI systems. However, deep RL policies are difficult to understand or verify. Prior work has shown deep RL policies are vulnerable to adversarial perturbations to their observations, but these observations need not be physically realistic. In this work, the authors show that it is possible to attack a victim deep RL policy simply by controlling another agent in a shared environment, taking actions to create natural observations that are adversarial to the vicitm policy. This has direct security implications when deploying in hostile environments, and provides a new method for worst-case testing for benign environments. The authors intend to explore defences such as adversarial training in future work.