News RSS feed

Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems

03 Jan 2023

One prevalent alignment strategy is to 1) capture “what we want” in a loss function to a very high degree, 2) use that loss function to train the AI, and 3) get the AI to exclusively care about optimizing that objective.

Fast Deliberation is Related to Unconditional Behaviour in Iterated Prisoners’ Dilemma Experiments. Scientific Reports, 12(1), 1-10.

29 Dec 2022

Is the speed with which people act in a strategic situation related to their social preferences?

Competence-Aware Systems

12 Dec 2022

The paper introduces a novel approach to building competence-aware systems (CAS) that reason about their own competence in the form of multiple levels of autonomy.

Trade Regulation Rule on Commercial Surveillance and Data Security Rulemaking

05 Dec 2022

They argue that today, regulators and policymakers focus on litigating isolated algorithmic harms, such as model bias or privacy violations. But this agenda neglects the persistent effects of AI systems on consumer populations.

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

18 Nov 2022

The paper “Time-Efficient Reward Learning via Visually Assisted Cluster Ranking” was accepted at the Human-in-the-loop Learning (HILL) Workshop, which will be held at NeurIPS 2022.

Brian Christian’s “The Alignment Problem” wins the Excellence in Science Communication Award From Eric & Wendy Schmidt and the National Academies

01 Nov 2022

Brian Christian has been named one of the inaugural recipients of the National Academies Eric and Wendy Schmidt Awards for Excellence in Science Communication

The Shard Theory of Human Values

14 Oct 2022

Starting from neuroscientifically grounded theories like predictive processing and reinforcement learning, Quintin Pope and Alex Turner (a postdoc at CHAI) set out a theory of what human values are and how they form within the human brain. In The shard theory of human values published in the AI Alignment Forum, they analyze the idea that human values are contextually activated influences on decision-making (“shards”) formed by reinforcement events. For example, a person’s reward center activates when they see their friend smile, which triggers credit assignment, which upweights and generalizes the person’s thoughts which led to the reward event (like “deciding to hang out with the friend” or “telling a joke”), creating a contextual influence (a “friendship-shard”) which, when the friend is nearby, influences the person to hang out with their friend again. This theory explains a range of human biases and “quirks” as consequences of shard dynamics.

Social Media is Polluting Society. Content Moderation Alone Won’t Fix the Problem

10 Oct 2022

In Social media is polluting society. Content moderation alone won’t fix the problem published in the MIT Technology Review, CHAI’s Thomas Krendl Gilbert argues that if content moderation on social media were implemented perfectly, it would still miss a whole host of issues that are often portrayed as moderation problems but really are not. He explains that in order to address those non-speech issues, we need a new strategy: treat social media companies as potential polluters of the social fabric, and directly measure and mitigate the effects their choices have on human populations. That means establishing a policy framework—perhaps through something akin to an Environmental Protection Agency or Food and Drug Administration for social media—that can be used to identify and evaluate the societal harms generated by these platforms. If those harms persist, that group could be endowed with the ability to enforce those policies. But to transcend the limitations of content moderation, such regulation would have to be motivated by clear evidence and be able to have a demonstrable impact on the problems it purports to solve.

Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems

03 Oct 2022

In Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems, CHAI’s Siddharth Srivastava argues that reinforcement learning in problems with symbolic state spaces is challenging due to the need for reasoning over long horizons. This paper presents a new approach that utilizes relational abstractions in conjunction with deep learning to learn a generalizable Q-function for such problems. The learned Q-function can be efficiently transferred to related problems that have different object names and object quantities, and thus, entirely different state spaces. We show that the learned, generalized Q- function can be utilized for zero-shot transfer to re- lated problems without an explicit, hand-coded curriculum. Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects.

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis

26 Sep 2022

The paper catalogues the values that seem most relevant to AI-driven content personalization algorithms.

« Previous PageNext Page »