NeurIPS Paper: Replay-Guided Adversarial Environment Design 12-6

10 Nov 2021

CHAI’s Michael Dennis co-authored the paper together with Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, and  Tim Rocktäschel. Read the abstract here: 

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent’s capabilities.  This approach leads to the emergence of diverse training environments which challenge the policy to be more robust and generalizable.  This paper casts Prioritized Level Replay (PLR) as a method for UED, arguing that by curating completely random levels can generate novel and complex levels. Furthermore, the paper theoretically motivates a counterintuitive improvement to PLR, improving performance by training on less data. The experiments confirm that our new method, PLR⊥, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR⊥ improves the performance of PAIRED, from which it inherited its theoretical framework.