Opinion Test your interpretability techniques by de-censoring Chinese models AI News Team January 15, 2026 Published on January 15, 2026 4:33 PM GMTThis work was conducted during the MATS 9.0…
Opinion Reflections on TA-ing Harvard’s first AI safety course AI News Team January 15, 2026 Published on January 15, 2026 4:28 PM GMTThis fall Boaz Barak taught Harvard’s first AI…
Opinion I Made a Judgment Calibration Game for Beginners (Calibrate) AI News Team January 15, 2026 Published on January 15, 2026 3:04 PM GMTI made a game that teaches beginner calibration.…
Opinion AI #151: While Claude Coworks AI News Team January 15, 2026 Published on January 15, 2026 2:30 PM GMTClaude Code and Cowork are growing so much…
Opinion Corrigibility Scales To Value Alignment AI News Team January 15, 2026 Published on January 15, 2026 12:05 AM GMTEpistemic status: speculation with a mix of medium…
Opinion Deeper Reviews for the top 15 (of the 2024 Review) AI News Team January 15, 2026 Published on January 14, 2026 11:59 PM GMTWe're extending the Discussion Phase of the 2024…
Opinion Boltzmann Tulpas AI News Team January 15, 2026 Published on January 14, 2026 9:45 PM GMT(A work of anthropic theory-fiction).Motivating question: Why do…
Opinion Status In A Tribe Of One AI News Team January 15, 2026 Published on January 14, 2026 8:44 PM GMTI saw a tweet thread the other day,…
Opinion Quantifying Love and Hatred AI News Team January 15, 2026 Published on January 14, 2026 8:40 PM GMTImagine a friend gets kidnapped by mobsters who…
Opinion Why we are excited about confession! AI News Team January 15, 2026 Published on January 14, 2026 8:37 PM GMTBoaz Barak, Gabriel Wu, Jeremy Chen, Manas Joglekar[Linkposting…