Alignment Science Blog

Articles

November 2024

Three Sketches of ASL-4 Safety Case Components

We sketch out three hypothetical arguments one could make to rule out misalignment risks in powerful near-future models capable of sabotage.

About the Alignment Science Blog

We are Anthropic's Alignment Science team. We do machine learning research on the problem of steering and controlling future powerful AI systems, as well as understanding and evaluating the risks that they pose. Welcome to our blog!

This blog is inspired by the informal updates on our Interpretability team's Transformer Circuits thread. We'll use it to release research notes and early findings that we don't think warrant a full publication, but might nonetheless be useful to others working on similar problems.

P.S. we're hiring in the US and in London!