News
Linear Probe Penalties Reduce LLM Sycophancy
14 Dec 2024
Visiting ETH MsC student Henry Papadatos and supervising CHAI PhD student Rachel Freedman publish an article “Linear Probe Penalties Reduce LLM Sycophancy” at the NeurIPS SoLaR workshop. The paper demonstrates a generalizable methodology for reducing unwanted LLM behaviors that are not sufficiently disincentivized by RLHF fine-tuning
Rachel Freedman selected as inaugural Cooperative AI Fellow
30 Nov 2024
Rachel Freedman, PhD Student, has been selected as one of the fellows for Cooperative AI’s PhD Fellow Program.
Representative Social Choice: From Learning Theory to AI Alignment
12 Nov 2024
Tianyi Qiu, CHAI Intern, wrote this paper which was accepted by NeurIPS 2024 Pluralistic Alignment Workshop. Here is the link to the paper.
Getting By Goal Misgeneralization With a Little Help From a Mentor
10 Oct 2024
Khanh Nguyen, Mohamad Danesh, Ben Plaut, and Alina Trinh wrote this paper which was presented at Towards Safe & Trustworthy Agents Workshop at NeurIPS 2024.
“The Alignment Problem” Wins Xingdu Book Award
29 Aug 2024
Brian Christian’s book “The Alignment Problem” was announced as the sole winner in the New Knowledge Category for Imported Editions at the Xingdu Book Award ceremony in China. The Chinese translation was published this past year by Hunan Science & Technology Press.
Forget deepfake videos. Text and voice are this election’s true AI threat.
08 Jul 2024
Jonathan Stray, Senior Scientist at CHAI, and Jessica Alter, tech entrepreneur and co-founder of Tech for Campaigns, wrote an op-ed for The Hill regarding the risks posed by AI in this current election cycle.
Mitigating Partial Observability in Decision Processes via the Lambda Discrepancy
28 Jun 2024
This paper investigates fundamental concepts related to detecting and mitigating partial observability by measuring misalignment between value function estimates. The paper was presented at the “Finding the Frame” workshop at RLC 2024 and the “Foundations of Reinforcement Learning and Control” workshop at ICML 2024.