Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

18 Nov 2022

The paper “Time-Efficient Reward Learning via Visually Assisted Cluster Ranking” was accepted at the Human-in-the-loop Learning (HILL) Workshop, which will be held at NeurIPS 2022. It was written by CHAI Micah Carroll and Anca Dragan, along with David Zhang and Andreea Bobu.

It develops the idea that one of the most successful paradigms for reward learning uses human feedback in the form of comparisons. Although these methods hold promise, human comparison labeling is expensive and time consuming, constituting a major bottleneck to their broader applicability. The insight of the paper is that we can greatly improve how effectively human time is used in these approaches by batching comparisons together rather than having the human label each comparison individually. To do so, they leverage data dimensionality reduction and visualization techniques to provide the human with an interactive GUI displaying the state space, in which the user can label subportions of the state space. Across some simple Mujoco tasks, the paper shows that this high-level approach holds promise and is able to greatly increase the performance of the resulting agents, provided the same amount of human labeling time.