News RSS feed

Fairness and Sequential Decision Making: Limits, Lessons, and Opportunities

31 Jan 2023

As automated decision making and decision assistance systems become common in everyday life, research on the prevention or mitigation of potential harms that arise from decisions made by these systems has proliferated.

Active Reward Learning from Multiple Teachers

16 Jan 2023

CHAI intern Peter Barnett, PhD student mentor Rachel Freedman, Justin Svegliato and Stuart Russell’s new paper, “Active Reward Learning from Multiple Teachers”, was accepted for an oral presentation at the AAAI 2023 Workshop on AI Safety.

How to Use ChatGPT and Still Be a Good Person

10 Jan 2023

Brian Christian was interviewed by New York Times reporter Brian X. Chen about ChatGPT.

Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems

03 Jan 2023

One prevalent alignment strategy is to 1) capture “what we want” in a loss function to a very high degree, 2) use that loss function to train the AI, and 3) get the AI to exclusively care about optimizing that objective.

Fast Deliberation is Related to Unconditional Behaviour in Iterated Prisoners’ Dilemma Experiments. Scientific Reports, 12(1), 1-10.

29 Dec 2022

Is the speed with which people act in a strategic situation related to their social preferences?

Competence-Aware Systems

12 Dec 2022

The paper introduces a novel approach to building competence-aware systems (CAS) that reason about their own competence in the form of multiple levels of autonomy.

Trade Regulation Rule on Commercial Surveillance and Data Security Rulemaking

05 Dec 2022

They argue that today, regulators and policymakers focus on litigating isolated algorithmic harms, such as model bias or privacy violations. But this agenda neglects the persistent effects of AI systems on consumer populations.

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

18 Nov 2022

The paper “Time-Efficient Reward Learning via Visually Assisted Cluster Ranking” was accepted at the Human-in-the-loop Learning (HILL) Workshop, which will be held at NeurIPS 2022.

Brian Christian’s “The Alignment Problem” wins the Excellence in Science Communication Award From Eric & Wendy Schmidt and the National Academies

01 Nov 2022

Brian Christian has been named one of the inaugural recipients of the National Academies Eric and Wendy Schmidt Awards for Excellence in Science Communication

The Shard Theory of Human Values

14 Oct 2022

Starting from neuroscientifically grounded theories like predictive processing and reinforcement learning, Quintin Pope and Alex Turner (a postdoc at CHAI) set out a theory of what human values are and how they form within the human brain. In The shard theory of human values published in the AI Alignment Forum, they analyze the idea that human values are contextually activated influences on decision-making (“shards”) formed by reinforcement events. For example, a person’s reward center activates when they see their friend smile, which triggers credit assignment, which upweights and generalizes the person’s thoughts which led to the reward event (like “deciding to hang out with the friend” or “telling a joke”), creating a contextual influence (a “friendship-shard”) which, when the friend is nearby, influences the person to hang out with their friend again. This theory explains a range of human biases and “quirks” as consequences of shard dynamics.

« Previous PageNext Page »