News RSS feed

Committing to the wrong artificial delegate in a collective-risk dilemma is better than directly committing mistakes

13 May 2024

New research from computer scientists Inês Terrucha, Elias Fernández Domingos, Pieter Simoens, and Tom Lenaerts at the Vrije Universiteit Brussel, Université Libre de Bruxelles, and UC Berkeley’s Center for Human-Compatible AI

Reinforcement Learning with Human Feedback and Active Teacher Selection (RLHF and ATS)

30 Apr 2024

CHAI PhD graduate student, Rachel Freedman gave a presentation at Stanford University on critical new developments in AI safety, focusing on problems and potential solutions with Reinforcement Learning from Human Feedback (RLHF).

Reinforcement Learning Safety Workshop (RLSW) @ RLC 2024

15 Apr 2024

Important Dates
Paper submission deadline: May 10, 2024 (AoE)
Paper acceptance notification: May 23, 2024

Regulating Advanced Artificial Agents

06 Apr 2024

Governance frameworks should address the prospect of AI systems that cannot be safely tested.

CHAI Policy Internship

02 Apr 2024

Deadline April 17th, 2024. Policy Internship at Center for Human-Compatible Artificial Intelligence

Embracing AI That Reflects Human Values: Insights from Brian Christian’s Journey

28 Mar 2024

Discover how, Brian Christian, an acclaimed author’s quest for deeper understanding could lead to AI systems that truly mirror human values and decisions.

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

05 Mar 2024

The researchers at Center for Human-Compatible AI (CHAI) at the University of California, Berkeley, has embarked on a study that brings to light the nuanced challenges encountered when AI systems learn from human feedback, especially under conditions of partial observability.

The Prosocial Ranking Challenge – $60,000 in prizes for better social media algorithms

18 Jan 2024

Deadline extended! First round submissions now due April 15th. See below.

Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

16 Jan 2024

How can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? The authors of this paper examine the problem of determining demonstration sufficiency.

ALMANACS: A Simulatability Benchmark for Language Model Explainability

20 Dec 2023

How do we measure the efficacy of language model explainability methods? The authors of this paper present ALMANACS, a language model explainability benchmark that scores explainability methods on simulatability.

« Previous PageNext Page »