“Look,” whispered Chuck, and George lifted his eyes to heaven. (There is always a last time for everything.)
Overhead, without any fuss, the stars were going out.
Arthur C. Clarke, The Nine Billion Names of God
Introduction
In the tradition of fun and uplifting April Fool’s day posts, I want to talk about three ways that AI Safety (as a movement/field/forum/whatever) might “go out with a whimper”. By go out with a whimper I mean that, as we approach some critical tipping point for capabilities, work in AI safety theory or practice might actually slow down rather than speed up. I see all of these failure modes to some degree today, and have some expectation that they might become more prominent in the near future.
Mode 1: Prosaic Capture
This one is fairly self-explanatory. As AI models get stronger, more and more AI safety people are recruited and folded into lab safety teams doing product safety work. This work is technically complex, intellectually engaging, and actually getting more important—after all, the technology is getting more powerful at a dizzying rate. Yet at the same time interest is diverted from the more “speculative” issues that used to dominate AI alignment discussion, mostly because the things we have right now look closer and closer to fully-fledged AGIs/ASIs already, so it seems natural to focus on analysing the behaviour and tendencies of LLM systems, especially when they seem to meaningfully impact how AI systems interact with humans in the wild.
As a result, if there is some latent Big Theory Problem underlying AI research (not only in the MIRI sense but also in the sense of “are corrigible optimiser agents even a good target”/”how do we align the humans” or similar questions), there may actually be less attention paid to it over time as we approach some critical inflection point.
Mode 2: Attention Capture
Many people in AI safety are now closely collaborating with or dependent on AI agents e.g. Claude Code or OpenAI Codex for research, while also using Claude or ChatGPT as everything from a theoretical advisor to life coach. In some sense this is even worse than quotes like “scheming viziers too cheap to meter” would imply: Imagine if the leaders of the US, UK, China, and the EU all talked to the same 1-3 scheming viziers on loan from the same three consulting firms all day.
I suspect that this is really bad for community epistemics for a bunch of reasons. For example, whatever the agents refuse to do or do poorly will receive less focus due to the spotlight effect. Practically speaking, what the models are good at becomes what the community is good at or what the community can do easily, because to push against the flow means appearing (or genuinely becoming) slow, cumbersome, and less efficient. At the same time, if there are some undetected biases in the agents that favour certain methodologies, experiments, or interpretations, those will quietly become the default background priors for the community. Does Claude or Gemini favour the linear representation hypothesis or the platonic representation hypothesis?
In effect reliance on models creates a bounding box around ideas that are easier and ideas that are harder to work with, so long as the models are not literally perfect at every task type. If the resulting cluster of available ideas do not match the core ideas we should be looking at to solve alignment/safety, then the community naturally drifts away from actually tackling central issues. This drift is coordinated as well, because everyone is using the same tools, manufacturing a kind of forced information cascade with the model at the centre.
Mode 3: Loss of Capability
Right now, the world is facing an unprecedented attack on its epistemics and means of truth-seeking thanks to the provision of AI systems that can generate fake images or videos for almost everything. This technology is being embraced at the highest levels of state and also spreads rapidly online. At the same time, the idea of epistemic capture from LLM use and the broader concern over “AI psychosis” reflect what I think is a pretty reasonable concern about talking to a confabulating simulator all day, no matter how intelligent.
At the limit, I worry that people who might otherwise contribute to AI safety are instead “captured” by LLM partners or LLM-suggested thought patterns that are not actually productive, chasing rabbit holes or dead ends that lead to wasted time and effort or (in worse cases) mental and physical harm. In effect this just means that there are less well-balanced, capable people to draw on when the community faces its most severe challenges. By the way, I think this is a problem for many organisations around the world, not just the AI safety community.
Mode 4: Disillusionment
AI safety and ethics are increasingly the topic of heated political debates. This can lead to profound mental and emotional stress on people in these fields. Eventually, people might burn out or just switch careers, right as the topic is at its most important.
Potential mitigations
I didn’t want to just write a very depressing post, so here are my ideas for how to address these issues:
Portfolio diversification: Funders and organisations should allocate some (not a majority, but not a token amount either) of their resources to ensuring that a wide portfolio of ideas are supported, such that there is room to pivot quickly if the situation changes drastically (And if you don’t think the situation will change drastically, why are you so sure about that? After all, in 2019 the situation didn’t seem ready to change drastically either.).
Developing alternate working structures: LLMs are clearly good at a lot of things. However, I suspect that some kind of cognitive “back-benching” may be helpful, where people serve as a sanity check or weathervane to monitor if the community as a whole is drifting in certain directions. I would in particular be interested in funding people to do research LLMs seem bad at doing right now. And if we don’t know what they are bad at, I think we should find out fast!
Investing in community health: AI and AI safety are famously stressful fields. Investing in community health measures and reducing emphasis on constant accelerating/grinding gives people slack to defend themselves against burnout and other forms of cognitive and psychological pressure. Of all of these measures I have suggested I think this one is the most nebulous but also the most important. As a community tackling a hard problem we should be prepared to help each other through hard times, and not only on paper or by offering funding.
Discuss Read More
