Opinion Realistic Reward Hacking Induces Different and Deeper Misalignment AI News Team October 10, 2025 Published on October 9, 2025 6:45 PM GMTTL;DR: I made a dataset of realistic harmless…
Opinion The Thinking Machines Tinker API is good news for AI control and security AI News Team October 9, 2025 Published on October 9, 2025 3:22 PM GMTLast week, Thinking Machines announced Tinker. It’s an…
Opinion Are We Leaving Literature To The Psychotic? AI News Team October 9, 2025 Published on October 9, 2025 6:09 AM GMTThose who have fallen victim to LLM psychosis…
Opinion Lessons from the Mountains AI News Team October 9, 2025 Published on October 9, 2025 4:10 AM GMTHow close have you come to death?I don't…
Opinion Probabilistic Societies AI News Team October 9, 2025 Published on October 9, 2025 4:08 AM GMTPrediction markets are everywhere.Information Distribution SystemsAt the core…
Opinion Inverting the Most Forbidden Technique: What happens when we train LLMs to lie detectably? AI News Team October 9, 2025 Published on October 9, 2025 12:43 AM GMTThis is a write-up of my recent work…
Opinion Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior AI News Team October 9, 2025 Published on October 8, 2025 10:02 PM GMTThis is a link post for two papers…
Opinion Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior AI News Team October 9, 2025 Published on October 8, 2025 10:02 PM GMTThis is a link post for two papers…
Opinion NEPA, Permitting and Energy Roundup #2 AI News Team October 9, 2025 Published on October 8, 2025 8:20 PM GMTIt’s been about a year since the last…
Opinion What shapes does reasoning take but circular? AI News Team October 9, 2025 Published on October 8, 2025 8:18 PM GMTIn a blog post about local and global…