News
Committing to the wrong artificial delegate in a collective-risk dilemma is better than directly committing mistakes
13 May 2024
New research from computer scientists Inês Terrucha, Elias Fernández Domingos, Pieter Simoens, and Tom Lenaerts at the Vrije Universiteit Brussel, Université Libre de Bruxelles, and UC Berkeley’s Center for Human-Compatible AI
Reinforcement Learning with Human Feedback and Active Teacher Selection (RLHF and ATS)
30 Apr 2024
CHAI PhD graduate student, Rachel Freedman gave a presentation at Stanford University on critical new developments in AI safety, focusing on problems and potential solutions with Reinforcement Learning from Human Feedback (RLHF).
When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning
05 Mar 2024
The researchers at Center for Human-Compatible AI (CHAI) at the University of California, Berkeley, has embarked on a study that brings to light the nuanced challenges encountered when AI systems learn from human feedback, especially under conditions of partial observability.
Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning
16 Jan 2024
How can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? The authors of this paper examine the problem of determining demonstration sufficiency.