Opinion An Alignment Journal: Coming Soon AI News Team March 4, 2026 tl;dr We’re incubating an academic journal for AI alignment: rapid peer-review of foundational Alignment research…
Industry ChatGPT’s new GPT-5.3 Instant model will stop telling you to calm down AI News Team March 4, 2026 The company says the new model will reduce the "cringe" that's been annoying its users…
Industry Claude Code rolls out a voice mode capability AI News Team March 4, 2026 Anthropic is stepping up its game in the AI coding space with the rollout of…
Opinion Current activation oracles are hard to use AI News Team March 4, 2026 This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.tldr;Activation…
Industry X says it will suspend creators from revenue-sharing program for unlabeled AI posts of ‘armed conflict’ AI News Team March 3, 2026 Creators who break the rules will get a 3-month suspend, and if they continue to…
Opinion Question: Why is the goal of AI safety not ‘moral machines’? AI News Team March 3, 2026 There is a basic question that has been confusing me for a while that I…
Opinion An Age Of Promethean Ambitions AI News Team March 3, 2026 In a recent post, I wrote the following:[Without] the context of history, we're blind to…
Opinion White-Box Attacks on the Best Open-Weight Model: CCP Bias vs. Safety Training in Kimi K2.5 AI News Team March 3, 2026 Over the last month I have been trying to see just how much I can…
Opinion I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB AI News Team March 3, 2026 Click here if you just want to see the Database I made of all[1] AI safety…
Opinion Constitutional Black-Box Monitoring for Scheming in LLM Agents AI News Team March 3, 2026 Paper: https://arxiv.org/abs/2603.00829Thread: https://x.com/syghmon/status/2028878121051496674Executive SummaryBlack-box monitors can detect scheming in AI agents using only externally observable…