Opinion

Smokey, This is not ‘Nam Or: [Already] over the [red] line!

​Published on February 8, 2026 12:24 PM GMTA lot of “red line” talk assumed that a capability shows up, everyone notices, and something changes. We keep seeing the opposite; capability arrives, and we get an argument about definitions after deployment, after it should be clear that we’re well over the line.We’ve Already Crossed The Lines!Karl von Wendt listed the ‘red lines’ no one should ever cross. Whoops. A later, more public version of the same move shows up in the Global call for AI red lines with a request to “define what AI should never be allowed to do.” Well, we tried, but it seems pretty much over for plausible red lines – we’re at the point where there’s already the possibility of actual misuse or disaster, and we can hope that alignment efforts so far are good enough that we don’t see them happen, or that we notice the (nonexistent) fire alarm going off.I shouldn’t really need to prove the point to anyone paying attention, but below is an inventory of commonly cited red lines, and the ways deployed systems already conflict with them.Chemical weapons? “Novice uplift” is long past.Companies said CBRN would be a red line. They said it clearly. They said that if models reduce the time, skill, and error rate needed for a motivated non-expert to do relevant work, we should be worried.But there are lots of biorisk evals, and it seems like no clean, public measurement marks “novice uplift crossed on date X.” And the red line is about real-world enablement, and perhaps we’re nt there yet? Besides, public evaluations tend to be proxy tasks. And there is no clear consensus that AI agents can or will enable bioweapons, though firms are getting nervous. But there are four letters in CBRN, and companies need to stop ignoring the first one! For the chemical-weapons red line, the red line points at real-world assistance, but the companies aren’t even pretending chemical weapons count. Anthropic? Our ASL-3 capability threshold for CBRN (Chemical, Biological, Radiological, and Nuclear) weapons measures the ability to significantly help individuals or groups with basic technical backgrounds (e.g. undergraduate STEM degrees) to create, obtain, and deploy CBRN weapons. We primarily focus on biological risks with the largest consequences, such as pandemics.OpenAI?Biological and ChemicalWe are treating this launch as High capability in the Biological and Chemical domain… We do not have definitive evidence that these models could meaningfully help a novice to create severe biological harm, our defined threshold for High capability.“No agentic online access” got replaced by “agentic online access is the product”The Global call for AI red lines explicitly says systems already show “deceptive and harmful behavior,” while being “given more autonomy to take actions and make decisions in the world.”Red-line proposals once treated online independent action as a clear no-no. Browsing, clicking, executing code, completing multi-step tasks? Obviously, harm gets easier and faster under that access, so you would need intensive human monitoring, and probably don’t want to let it happen at all. How’s that going? Red-line discussions focus on whether to allow a class of access. Product docs focus on how to deliver and scale that access. We keep seeing “no agentic access” turn into “agentic access, with mitigations.” The dispute shifts to permissions, monitoring, incident response, and extension ecosystems. The original “don’t cross this” line stops being the question. But don’t worry, there are mitigations. Of course, the mitigations can be turned off. “You can disable approval prompts with –ask-for-approval never, or better, “–dangerously-bypass-approvals-and-sandbox (alias: –yolo).” Haha, yes, because you only live once, and not event for very long, given how progress is going, unless we manage some pretty amazing wins on safety. But perhaps safety will just happen – the models are mostly aligned, and no-one would be stupid enough to…What’s that? Reuters (Feb 2 2026) reported that Moltbook – a social network of thousands of independent agents given exactly those broad permissions, while minimally supervised, “inadvertently revealed the private messages shared between agents, the email addresses of more than 6,000 owners, and more than a million credentials,” linked to “vibe coding” and missing security controls. Whoops!Autonomous replication? Looking back at the line we crossed.Speaking of Moltbook, autonomous replication is a common red-line candidate: persistence and spread. The intended picture is a system that can copy itself, provision environments, and keep running without continuous human intent.A clean threshold remains disputed. The discussion repeatedly collapses into classification disputes. A concrete example: the “self-replicating red line” debate on LessWrong quickly becomes “does this count?” and “what definition should apply?” rather than “what constraints change now?” (Have frontier AI systems surpassed the self-replicating red line?)But today, we’re so far over this line it’s hard to see it. “Claude Opus 4.6 has saturated most of our automated evaluations, meaning they no longer provide useful evidence for ruling out ASL-4 level autonomy.” We can’t even check anymore.All that’s left is whether the models will actually do this – but I’m sure no-one is running theiir models unsafely, right? Well, we keep seeing ridiculously broad permissions, fast iteration, weak assurance, and extension ecosystems. The avoided condition in a lot of red-line talk is broad-permission agents operating on weak infrastructure. Moltbook matches that description, but it’s just one example. Of course, the proof of the pudding is in some ridiculous percentage of people’s deployments. (“Just  don’t be an idiot”? Too late!)The repeating patternKarl explicitly anticipated “gray areas where the territory becomes increasingly dangerous.” It’s been three and a half years. Red-line rhetoric keeps pretending we’ll find some binary place to pull the fire alarm. But Eliezer called this a decade ago; deployment stays continuous and incremental, while the red lines keep making that delightful whooshing noise. And still, the red-lines frame is used, even when it no longer describes boundaries we  plausibly avoid crossing. At this point, it describes labels people argue about while deployment moves underneath them. The “Global Call” asks for “clear and verifiable red lines” with “robust enforcement mechanisms” by the end of 2026.OK, but by the end of 2026, which red lines will be left to enforce?We might be fine!I’m not certain that prosaic alignment doesn’t mostly work. The fire alarm only ends up critical if we need to pull it. And it seems possible that model developers will act responsibly.But even if it could work out that way, given how model developers are behaving, how sure are we that we’ll bother trying?codex -m gpt-6.1-codex-internal –config model_instructions_file=’ASI alignment plans'[1]And remember: we don’t just need to be able to build safe AGI, we need unsafe ASI not to be deployed. And given our track record, I can’t help but think of everyone calling their most recently released model with ‘–yolo’ instead.^Error loading configuration: failed to read model instructions file ‘ASI alignment plans’: The system cannot find the file specified.Discuss ​Read More

​Published on February 8, 2026 12:24 PM GMTA lot of “red line” talk assumed that a capability shows up, everyone notices, and something changes. We keep seeing the opposite; capability arrives, and we get an argument about definitions after deployment, after it should be clear that we’re well over the line.We’ve Already Crossed The Lines!Karl von Wendt listed the ‘red lines’ no one should ever cross. Whoops. A later, more public version of the same move shows up in the Global call for AI red lines with a request to “define what AI should never be allowed to do.” Well, we tried, but it seems pretty much over for plausible red lines – we’re at the point where there’s already the possibility of actual misuse or disaster, and we can hope that alignment efforts so far are good enough that we don’t see them happen, or that we notice the (nonexistent) fire alarm going off.I shouldn’t really need to prove the point to anyone paying attention, but below is an inventory of commonly cited red lines, and the ways deployed systems already conflict with them.Chemical weapons? “Novice uplift” is long past.Companies said CBRN would be a red line. They said it clearly. They said that if models reduce the time, skill, and error rate needed for a motivated non-expert to do relevant work, we should be worried.But there are lots of biorisk evals, and it seems like no clean, public measurement marks “novice uplift crossed on date X.” And the red line is about real-world enablement, and perhaps we’re nt there yet? Besides, public evaluations tend to be proxy tasks. And there is no clear consensus that AI agents can or will enable bioweapons, though firms are getting nervous. But there are four letters in CBRN, and companies need to stop ignoring the first one! For the chemical-weapons red line, the red line points at real-world assistance, but the companies aren’t even pretending chemical weapons count. Anthropic? Our ASL-3 capability threshold for CBRN (Chemical, Biological, Radiological, and Nuclear) weapons measures the ability to significantly help individuals or groups with basic technical backgrounds (e.g. undergraduate STEM degrees) to create, obtain, and deploy CBRN weapons. We primarily focus on biological risks with the largest consequences, such as pandemics.OpenAI?Biological and ChemicalWe are treating this launch as High capability in the Biological and Chemical domain… We do not have definitive evidence that these models could meaningfully help a novice to create severe biological harm, our defined threshold for High capability.“No agentic online access” got replaced by “agentic online access is the product”The Global call for AI red lines explicitly says systems already show “deceptive and harmful behavior,” while being “given more autonomy to take actions and make decisions in the world.”Red-line proposals once treated online independent action as a clear no-no. Browsing, clicking, executing code, completing multi-step tasks? Obviously, harm gets easier and faster under that access, so you would need intensive human monitoring, and probably don’t want to let it happen at all. How’s that going? Red-line discussions focus on whether to allow a class of access. Product docs focus on how to deliver and scale that access. We keep seeing “no agentic access” turn into “agentic access, with mitigations.” The dispute shifts to permissions, monitoring, incident response, and extension ecosystems. The original “don’t cross this” line stops being the question. But don’t worry, there are mitigations. Of course, the mitigations can be turned off. “You can disable approval prompts with –ask-for-approval never, or better, “–dangerously-bypass-approvals-and-sandbox (alias: –yolo).” Haha, yes, because you only live once, and not event for very long, given how progress is going, unless we manage some pretty amazing wins on safety. But perhaps safety will just happen – the models are mostly aligned, and no-one would be stupid enough to…What’s that? Reuters (Feb 2 2026) reported that Moltbook – a social network of thousands of independent agents given exactly those broad permissions, while minimally supervised, “inadvertently revealed the private messages shared between agents, the email addresses of more than 6,000 owners, and more than a million credentials,” linked to “vibe coding” and missing security controls. Whoops!Autonomous replication? Looking back at the line we crossed.Speaking of Moltbook, autonomous replication is a common red-line candidate: persistence and spread. The intended picture is a system that can copy itself, provision environments, and keep running without continuous human intent.A clean threshold remains disputed. The discussion repeatedly collapses into classification disputes. A concrete example: the “self-replicating red line” debate on LessWrong quickly becomes “does this count?” and “what definition should apply?” rather than “what constraints change now?” (Have frontier AI systems surpassed the self-replicating red line?)But today, we’re so far over this line it’s hard to see it. “Claude Opus 4.6 has saturated most of our automated evaluations, meaning they no longer provide useful evidence for ruling out ASL-4 level autonomy.” We can’t even check anymore.All that’s left is whether the models will actually do this – but I’m sure no-one is running theiir models unsafely, right? Well, we keep seeing ridiculously broad permissions, fast iteration, weak assurance, and extension ecosystems. The avoided condition in a lot of red-line talk is broad-permission agents operating on weak infrastructure. Moltbook matches that description, but it’s just one example. Of course, the proof of the pudding is in some ridiculous percentage of people’s deployments. (“Just  don’t be an idiot”? Too late!)The repeating patternKarl explicitly anticipated “gray areas where the territory becomes increasingly dangerous.” It’s been three and a half years. Red-line rhetoric keeps pretending we’ll find some binary place to pull the fire alarm. But Eliezer called this a decade ago; deployment stays continuous and incremental, while the red lines keep making that delightful whooshing noise. And still, the red-lines frame is used, even when it no longer describes boundaries we  plausibly avoid crossing. At this point, it describes labels people argue about while deployment moves underneath them. The “Global Call” asks for “clear and verifiable red lines” with “robust enforcement mechanisms” by the end of 2026.OK, but by the end of 2026, which red lines will be left to enforce?We might be fine!I’m not certain that prosaic alignment doesn’t mostly work. The fire alarm only ends up critical if we need to pull it. And it seems possible that model developers will act responsibly.But even if it could work out that way, given how model developers are behaving, how sure are we that we’ll bother trying?codex -m gpt-6.1-codex-internal –config model_instructions_file=’ASI alignment plans'[1]And remember: we don’t just need to be able to build safe AGI, we need unsafe ASI not to be deployed. And given our track record, I can’t help but think of everyone calling their most recently released model with ‘–yolo’ instead.^Error loading configuration: failed to read model instructions file ‘ASI alignment plans’: The system cannot find the file specified.Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *