CHAI 2017 Annual Workshop

The Center for Human-Compatible Artificial Intelligence (CHAI) hosted its first annual workshop on Friday, May 5 and Saturday, May 6, 2017. CHAI’s mission is to reorient the field of artificial intelligence toward developing systems that are provably beneficial to humans, and our annual workshop is designed to advance discussion and research toward that purpose.

The event itself was private, but the schedule is listed here for posterity.


Friday, May 5

8am Breakfast
9am Formal Introductions Stuart Russell
9:15am Opening Stuart Russell
10am Humans Should Not Be Obstacles Anca Dragan Download slides (PDF)
10:30am Break
11am On Iterated Inverse Reinforcement Learning for Safe AI Satinder Singh Baveja Download slides (PDF)
11:30am Inverse Reward Design Dylan Hadfield-Menell Download slides (KEY)
12pm Lunch
1:30pm Working, Earning, and Learning in the Era of the Intelligent Machine John Zysman
2pm Break
2:30pm Coordinating artificial intelligence with human normativity Gillian Hadfield Download slides (PPTX)
3pm Moral Responsibility, Blameworthiness, and Intention: In Search of Formal Definitions Joe Halpern Download slides (PDF)
3:30pm Breakout explanation and relocation
4pm Technical beginnings breakouts
5:30pm Dinner

Saturday, May 6

8am Breakfast
9am Generative Adversarial Imitation Learning Stefano Ermon Download slides (PPTX)
9:30am Semi-supervised reinforcement learning Paul Christiano Download slides (KEY)
10am Break
10:15am Meta-learning: Breaking the grad student descent barrier Pieter Abbeel
10:45am Break
11am Combinatorial Questions for AI Safety Laurent Orseau Download slides (PDF)
11:30am Algorithms Reasoning About Algorithms Andrew Critch
12pm Lunch
1:30pm Breakout reports
2:30pm Break/relocation
2:45pm Technical beginnings breakouts
4:15pm Break/relocation
4:30pm Closing Panel
5:30pm Dinner

Technical beginnings breakouts

Topic Facilitator
Preference Aggregation: How should we aggregate the prefereces of multiple humans/institutions? For example, via:
  • Average utilitarianism?
  • Total utilitarianism?
  • Maximin?
Daniel Filan
"Actual Humans": What idiosyncratic properties of human beings, as distinguished from theoretical rational agents, bear on the relationship between humans and AI systems? For example,
  • Bounded rationality
  • Biases
  • Inconsistencies
  • Self-conflict
Smitha Milli
Single-principal value alignment: What problems arise in value alignment for a single human/machine system? For example,
  • Reward hacking
  • Corrigibility
Dylan Hadfield-Menell
Interactive control: What models of control, or forms of interaction, should we employ for developing provably beneficial AI systems? For example,
  • Active learning (queries from the algorithm to a human)
  • Transparency (queries from a human to the algorithm)
Jaime Fisac
Minimum-viable world-takeover capabilities: What capabilities in a future AI or machine learning system would confer to it, or its owners, an extreme power advantage over the rest of humanity? Tsvi Benson-Tilsen