Increasing AI Strategic Competence as a Safety Approach

Published on February 3, 2026 1:08 AM GMTIf AIs became strategically competent enough, they may realize that RSI is too dangerous because they’re not good enough at alignment or philosophy or strategy, and potentially convince, help, or work with humans to implement an AI pause. This presents an alternative “victory condition” that someone could pursue (e.g. by working on AI strategic competence) if they were relatively confident about the alignment of near-human-level AIs but concerned about the AI transition as a whole, for example because they’re worried about alignment of ASI, or worried about correctly solving other philosophical problems that would arise during the transition. (But note that if the near-human-level AIs are not aligned, then this effort could backfire by letting them apply better strategy to take over more easily.)Strategic vs Philosophical CompetenceThe previous “victory path” I’ve been focused on was to improve AI philosophical competence, under the theory that if the AIs are aligned, they’ll want to help us align the next generation of AIs and otherwise help guide us through the AI transition. I think by default they will be too incompetent at philosophical reasoning to do a good enough job at this, hence the proposal to improve such competence. However accomplishing this may well be too hard, thus leading to this new idea. I note that high-level strategic competence shares some characteristics with philosophical competence, such as sparse or absent feedback from reality and dependence on human evaluations, but may be significantly easier due to more conceptual clarity about the target being aimed for, and continuity with other easier-to-train capabilities such as low and mid-level strategy.Unilateral Refusal vs AI Assistance for Pausing AII found a couple of related posts, AIs should also refuse to work on capabilities research by @Davidmanheim and this shortform by Vladimir Nesov. There’s also an earlier paper that makes a similar point as David Manheim’s post, which focuses on AIs unilaterally refusing to do capabilities research. But I think this has two issues:The AIs may not be strategically competent enough to decide to refuse, similar to how a large number of humans are not refusing to work on AI capabilities research.Such unilateral refusal is a form of intent misalignment, and seems relatively easy for AI companies to “correct” or prevent by using standard control and/or alignment techniques. (This comment by @tanae makes a similar point.)In comparison, my “victory path” sees some humans working deliberately to increase AI strategic competence, and instead of unilaterally refusing to contribute to RSI, the AIs help or work with more humans (including by argumentation/persuasion/advice) to implement a global RSI pause.Discuss Read More

Related Posts

On Stance

Eons of Utopia

The pace of progress, 4 years later

Leave a Reply Cancel reply