Research

CHAI aims to reorient the foundations of AI research toward the development of provably beneficial systems. Currently, it is not possible to specify a formula for human values in any form that we know would provably benefit humanity, if that formula were instated as the objective of a powerful AI system. In short, any initial formal specification of human values is bound to be wrong in important ways. This means we need to somehow represent uncertainty in the objectives of AI systems. This way of formulating objectives stands in contrast to the standard model for AI, in which the AI system's objective is assumed to be known completely and correctly.

Therefore, much of CHAI's research efforts to date have focussed on developing and communicating a new model of AI development, in which AI systems should be uncertain of their objectives, and should be deferent to humans in light of that uncertainty. However, our interests extend to a variety of other problems in the development of provably beneficial AI systems. Our areas of greatest focus so far have been the foundations of rational agency and causality, value alignment and inverse reinforcement learning, human-robot cooperation, multi-agent perspectives and applications, and models of bounded or imperfect rationality. Other areas of interest to our mission include adversarial training and testing for ML systems, various AI capabilities, topics in cognitive science, ethics for AI and AI development robust inference and planning, security problems and solutions, and transparency and interpretability methods.

In addition to purely academic work, CHAI strives to produce intellectual outputs for general audiences as well. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.

Below is a list of CHAI's publications since we began operating in 2016. Many of our publications are collaborations with other AI research groups; we view collaborations as key to integrating our perspectives into mainstream AI research.

1. Overviews

1.1. Books

1.2. Overviews of societal-scale risks from AI

2. Core topics

2.1. Foundations of rational agency & causality

2.2. Value alignment and inverse reinforcement learning

2.3. Human-robot cooperation

2.4. Multi-agent perspectives and applications

2.5. Models of bounded or imperfect rationality

3. Other topics

3.1. Adversarial training and testing

3.2. AI capabilities, uncategorized

3.3. Cognitive science, uncategorized

3.4. Ethics for AI and AI development

3.5. Robust inference, learning, and planning

3.6. Security problems and solutions

3.7. Transparency & interpretability