Opinion

Monday AI Radar #21

This week’s big story is the limited release of Claude Mythos Preview. The headline is that Mythos is alarmingly good at cybersecurity, with the ability to find and exploit critical vulnerabilities en masse. Anthropic is handling that responsibly, but the next year or two will be challenging for security. If you haven’t already, now is a good time to review and improve your personal security practices.Cybersecurity isn’t the only story here: Mythos appears to be the first of the next generation of much larger models. Early data suggest it represents another acceleration of the rate of capability progress, although that’s hard to assess while it’s still in limited release. And from a safety perspective, Anthropic says this is simultaneously the most aligned model they’ve ever created and the most dangerous.Top pickHow scary is Claude Mythos?Rob Wiblin’s analysis of Mythos covers all the key points. If you only read this piece, you won’t miss anything vital.Mythos Preview is another milestone on the race to AGI, arguably as significant as the November 2025 release of Opus 4.5 that kicked off the agentic coding craze. Rob covers both sides of this story: Mythos is the first model powerful enough to cause a major crisis if misused, and (as far as we can tell) also better aligned than any previous Anthropic model.I expect strong disagreement about how those two factors balance out. Some people will see Mythos as evidence that we are rushing toward AGI without having solved alignment, and others will argue that alignment is progressing as fast as capabilities and we’ll probably manage to muddle through. I believe those aren’t mutually exclusive: we are rushing toward AGI with an alignment strategy that is probably good enough to muddle through with, but which has a real chance of getting us all killed.Mythos is evidence for short timelines, bringing a big step forward for capabilities that is at least consistent with past trendlines and might represent an inflection point toward even faster progress.My writingQuick thoughts about MythosA few quick thoughts about the release of Claude Mythos Preview.Foundational beliefsSix foundational beliefs that shape how I think about AI safety strategy.Writing with robotsAI can’t write well, but it’s a great editor—here’s how I use it.MythosAll of the following pieces are good, but most of you can just read the summaries and pick and choose which links to follow.Mythos Preview’s cybersecurity capabilitiesMythos is better at finding and exploiting vulnerabilities than any past model:Anthropic’s analysis is spot on:There’s no denying that this is going to be a difficult time. While we hope that some of the suggestions above will be helpful in navigating this transition, we believe the capabilities that future language models bring will ultimately require a much broader, ground-up reimagining of computer security as a field.As part of that reimagining, Anthropic is giving key companies a head start in the cybersecurity arms race via Project Glasswing. This seems like the best path forward, which doesn’t mean it’s guaranteed to succeed.Ryan Greenblatt estimates the impact of MythosAn uncontrolled release could have been ugly:If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damagesThe Zvi reportZvi does a two-part deep dive, covering the system card and the cybersecurity implications. Excellent, comprehensive, long.New sages unrivalledDean Ball argues that Mythos marks a new era for AI. I agree, but I don’t have to like it.I wrote on X that Mythos means the training wheels are coming off on AI policy. Perhaps the Department of War’s effort to strangle Anthropic is, to use another metaphor, a sign that the gloves are off too. If the last month has made anything clear, it is that we are in a nastier, sharper, harsher, meaner era of AI discourse, policy, and—ultimately—of AI development and use.Failing to understand and plan for this new era might be the biggest unforced error the AI safety community will make over the next couple of years. Much more than previously, many key players will be motivated by ruthless self-interest rather than an altruistic desire to do what is best for humanity. We need to accept that fact and plan accordingly.Benchmarks and ForecastsRyan Greenblatt’s model of AI progressRyan Greenblatt has two long posts on the present state of AI and likely AI timelines. Highly recommended for a deep, gears-level model of how AI capabilities are likely to progress, and especially what the trajectory of AI R&D might look like. The headline result is that based on recent progress, Ryan (like many other people) is shortening his timeline to highly capable AI.A core part of his thesis is that AI is now immensely capable at coding tasks that are easy to verify. He argues that the human-equivalent time horizon for those tasks is now somewhere between months and years, which represents a superexponential rate of progress. That sounds right—the open question is how quickly we make progress on verifying more complex tasks.In light of Mythos, he estimates that AI is making Anthropic engineers 1.75x faster, but the overall speedup of Anthropic’s AI R&D is only 1.2x. It’s too early to tell whether that’s the early stage of an intelligence explosion, or an indication that other factors will bottleneck progress and prevent runaway acceleration.Musings on recursive self-improvementSeb Krier is skeptical that recursive self-improvement will go as fast as some people think:When people talk about recursive self-improvement, they sometimes acknowledge these frictions but then treat them as secondary, or assume that sufficiently capable systems can route around most of them via internal deployments and accelerated R&D. I think this is often overstated: these bottlenecks do not disappear just because model development speeds up. They are structural, not incidental, and they push strongly against the more explosive versions of the RSI story.It’s a great piece that goes beyond the usual “diffusion is slow” thesis. He makes a good case that AI progress will be tethered to—and rate limited by—human factors in ways that prevent a runaway takeoff.It’s a strong piece, and he points out some important dynamics. But beyond a certain capability level, I believe AI will be able to rapidly transform the world on its own regardless of whether human society can keep up.Jobs and the economyThe Windfall Policy AtlasThe newly released Windfall Policy Atlas is a great resource for anyone thinking about how to mitigate the economic and employment impacts of AI. It lists 48 potential policy levers (shortened work weeks, robot taxes, etc.), each with a description of how the policy might work and some selected reading.Autonomous weaponsThe global AI arms raceThe New York Times reviews the state of autonomous weapons ($). Fully autonomous weapons haven’t yet transformed the battlefield but capabilities are growing quickly, in part because of rapid iteration in Ukraine. At the current rate of progress, autonomous weapons will soon be essential in any armed conflict. It’s increasingly hard to see how a treaty against autonomous weapons is achievable, given rising global tensions and increased military spending.Strategy and politicsDaniel Kokotajlo and Dean Ball debate government’s role in AIThis is great: two strong thinkers in a debate format structured to maximize truth-seeking and finding common ground. Spoiler: plenty of tough problems, not so many easy answers.Can Sam Altman be trusted?The New Yorker has a long and devastating piece on Sam Altman’s history of lying and manipulation ($). It isn’t news that he is frequently dishonest, but this is the most comprehensive examination of the full scope of the problem.This is particularly distressing in light of the issues raised by Daniel and Dean above. If you don’t trust the government to manage AI and you don’t trust the CEO of one of the leading labs, that’s hardly ideal.Political violence is never acceptableZvi points out what ought to be obvious to any person with a functioning moral compass.We need more grantmakersSophie Kim and Ady Mehta argue that AI safety is critically constrained not by funding, but by the ability to usefully deploy funding:The capital is about to scale by orders of magnitude; the capacity to deploy it has not. This post is about that gap– and why filling it matters more than almost anything else in AI safety right now.Sketches of some defense-favoured coordination techForethought’s latest brainstorming piece explores how to use AI for coordination:We think that near-term AI could make it much easier for groups to coordinate, find positive-sum deals, navigate tricky disagreements, and hold each other to account.There are some intriguing ideas here. In particular, the background networking proposal seems like something a single person could deploy at a conference or other small event.Open modelsCan Chinese and open model companies keep up?Epoch’s Anson Ho explores the question of whether the Chinese and open model companies (which are not quite the same thing) can keep up with the frontier labs. It’s a solid analysis that considers compute capacity, distillation, how innovations spread, and more.There isn’t a simple answer, but he leans toward believing it will be hard to close the capability gap while the compute gap remains:For me the primary takeaway is this: compute is the biggest factor for which companies can compete at the capabilities frontier — efficiency matters too, but it’s probably not enough to make up for ten times less compute.Claude Mythos and misguided open-weight fearmongeringNathan Lambert argues against assuming that open models are too dangerous in a world with Mythos-level capabilities. It’s a thoughtful piece, but I’m unconvinced: if open models continue to progress rapidly, it’s hard to see how they don’t become broadly dangerous.Do we need an open model consortium?The open model world has recently faced challenges with key personnel leaving and hard questions about long-term financial viability. Nathan Lambert proposes a solution:a consortium is the only long-term stable path to well-funded, near-frontier open models.Perhaps, but that’s easier said than done. I’m curious about NVIDIA’s role here: they’re the only player with a clear funding strategy, but it’s hard to figure out their long-term motivations in this space.TechnicalTraining LLMs to predict world eventsThinking Machines and Mantic discuss how to build an AI forecasting system that approaches the performance of human experts. I was amused to see that even though Grok wasn’t a particularly good forecaster, it was the most valuable member of the forecasting ensemble because its predictions were highly decorrelated from the other models.Discuss Read More

Related Posts

Backyard cat fight shows Schelling points preexist language

Where I Am Donating in 2025

Now, “Tethering” is a Bigger Educational Problem Than “Involution”

Leave a Reply Cancel reply