Opinion No Strong Orthogonality From Selection Pressure AI News Team April 30, 2026 TL;DRIf everything goes according to plan, by the end of this post we should have…
Opinion Computation in Superposition: Two Handcrafted Models AI News Team April 30, 2026 Many interpretability researchers (ourselves included) believe that neural networks store knowledge in superposition—that is, networks…
Opinion Research Sabotage in ML Codebases AI News Team April 30, 2026 One of the main hopes for AI safety is using AIs to automate AI safety…
Opinion Research Sabotage in ML Codebases AI News Team April 30, 2026 One of the main hopes for AI safety is using AIs to automate AI safety…
Opinion The fall of the theorem economy (David Bessis) AI News Team April 30, 2026 I found this post from mathematician David Bessis very interesting. It explains that while AI…
Opinion Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training AI News Team April 30, 2026 IntroductionResearch by Frank Xiao (SPAR mentee) and Santiago Aranguri (Goodfire).Post-training can introduce undesired side effects…
Opinion Book review: The Infinity Machine AI News Team April 30, 2026 Book review: The Infinity Machine: Demis Hassabis, DeepMind, and the Quest for Superintelligence, by Sebastian…
Opinion Poisoning Fine-tuning Datasets of Constitutional Classifiers AI News Team April 29, 2026 The primary contributors to this work are Chase Bowers, Faizan Ali, John Hughes, Jerry Wei,…
Opinion Lorxus Does Budget Inkhaven Again: 04/22~04/28 AI News Team April 29, 2026 I'm doing Budget Inkhaven again! (I didn't realize last time that "Halfhaven" also meant specifically…
Opinion AGI is Probably Inevitable: A Model of Societal Ruptures AI News Team April 29, 2026 Introduction I posted a quick take a few days back claiming AI moratoriums won't work,…