News
Fairness and Sequential Decision Making: Limits, Lessons, and Opportunities
31 Jan 2023
As automated decision making and decision assistance systems become common in everyday life, research on the prevention or mitigation of potential harms that arise from decisions made by these systems has proliferated.
Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems
03 Jan 2023
One prevalent alignment strategy is to 1) capture “what we want” in a loss function to a very high degree, 2) use that loss function to train the AI, and 3) get the AI to exclusively care about optimizing that objective.
Trade Regulation Rule on Commercial Surveillance and Data Security Rulemaking
05 Dec 2022
They argue that today, regulators and policymakers focus on litigating isolated algorithmic harms, such as model bias or privacy violations. But this agenda neglects the persistent effects of AI systems on consumer populations.
Brian Christian’s “The Alignment Problem” wins the Excellence in Science Communication Award From Eric & Wendy Schmidt and the National Academies
01 Nov 2022
Brian Christian has been named one of the inaugural recipients of the National Academies Eric and Wendy Schmidt Awards for Excellence in Science Communication
The Shard Theory of Human Values
14 Oct 2022
Starting from neuroscientifically grounded theories like predictive processing and reinforcement learning, Quintin Pope and Alex Turner (a postdoc at CHAI) set out a theory of what human values are and how they form within the human brain. In The shard theory of human values published in the AI Alignment Forum, they analyze the idea that human values are contextually activated influences on decision-making (“shards”) formed by reinforcement events. For example, a person’s reward center activates when they see their friend smile, which triggers credit assignment, which upweights and generalizes the person’s thoughts which led to the reward event (like “deciding to hang out with the friend” or “telling a joke”), creating a contextual influence (a “friendship-shard”) which, when the friend is nearby, influences the person to hang out with their friend again. This theory explains a range of human biases and “quirks” as consequences of shard dynamics.