Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems

03 Jan 2023

CHAI Alex Turner wrote a blog post on the AI Alignment Forum titled Inner and outer alignment decompose one hard problem into two extremely hard problems.

One prevalent alignment strategy is to 1) capture “what we want” in a loss function to a very high degree, 2) use that loss function to train the AI, and 3) get the AI to exclusively care about optimizing that objective. This essay argues that each step contains either a serious and unnecessary difficulty, or an unnecessary assumption.