Opinion - AI News

Opinion

A Tale of Three Contracts

AI News Team March 4, 2026

The attempt on Friday by Secretary of War Pete Hegsted to label Anthropic as a…

Opinion

An Alignment Journal: Coming Soon

AI News Team March 4, 2026

tl;dr We’re incubating an academic journal for AI alignment: rapid peer-review of foundational Alignment research…

Opinion

Current activation oracles are hard to use

AI News Team March 4, 2026

This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.tldr;Activation…

Opinion

Question: Why is the goal of AI safety not ‘moral machines’?

AI News Team March 3, 2026

There is a basic question that has been confusing me for a while that I…

Opinion

An Age Of Promethean Ambitions

AI News Team March 3, 2026

In a recent post, I wrote the following:[Without] the context of history, we're blind to…

Opinion

White-Box Attacks on the Best Open-Weight Model: CCP Bias vs. Safety Training in Kimi K2.5

AI News Team March 3, 2026

Over the last month I have been trying to see just how much I can…

Opinion

I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB

AI News Team March 3, 2026

Click here if you just want to see the Database I made of all[1] AI safety…

Opinion

Constitutional Black-Box Monitoring for Scheming in LLM Agents

AI News Team March 3, 2026

Paper: https://arxiv.org/abs/2603.00829Thread: https://x.com/syghmon/status/2028878121051496674Executive SummaryBlack-box monitors can detect scheming in AI agents using only externally observable…

Opinion

In-context learning of representations can be explained by induction circuits

AI News Team March 3, 2026

This is a crosspost of my ICLR 2026 blogpost track post. All code and experiments…

Opinion

Single Direction vs Low-Rank Refusal in Small LLMs

AI News Team March 3, 2026

IntroductionI've recently came across an Alignment Forum post that showed refusal behaviors in LLMs can…