Opinion The Topology of LLM Behavior AI News Team February 28, 2026 I have this mental image that keeps coming back when I do prompt engineering. It's…
Opinion Coherent Care AI News Team February 28, 2026 I've been trying to gather my thoughts for my next tiling theorem (agenda write-up here;…
Opinion The tick in my back AI News Team February 28, 2026 It’s been almost sixteen years, I suspect, since the tick entered my body. It must…
Opinion Side by Side Comparison of RSP Versions AI News Team February 28, 2026 With all of the discussion about changes to Anthropic's Responsible Scaling Policy, I figured actually…
Opinion Anthropic and the DoW: Anthropic Responds AI News Team February 28, 2026 The Department of War gave Anthropic until 5:01pm on Friday the 27th to either give…
Opinion Ball+Gravity has a “Downhill” Preference AI News Team February 28, 2026 [epistemic status: This is a rambling thought experiment with the goal of clarifying my ontological…
Opinion Safe ASI Is Achievable: The Finite Game Argument AI News Team February 28, 2026 A few days ago, Anthropic dropped the central pledge of its Responsible Scaling Policy, the…
Opinion Best short introductions to AI safety & alignment for bright college students? AI News Team February 27, 2026 Hi, I've been asked to recommend a couple of short introductions/overviews about the key issues…
Opinion New ARENA material: 8 exercise sets on alignment science & interpretability AI News Team February 27, 2026 TLDRThis is a post announcing a lot of new ARENA material I've been working on…
Opinion 3 Challenges and 2 Hopes for the Safety of Unsupervised Elicitation AI News Team February 27, 2026 Authors: Callum Canavan*, Aditya Shrivastava*, Allison Qi, Jonathan Michala, Fabien Roger(*Equal contributions, alphabetical)tl;dr: We study…