The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass and write a whole-assed post.I agree with Eliezer’s main thesis that individual violence against AI researchers is both morally wrong and strategically stupid. Where I disagree is with the claim that international law can prevent extinction. It can’t, for the following reasons.I. International law is largely a fiction (especially when interests diverge sharply)The analogy with nuclear weapons is a poor one. North Korea signed the nuclear non-proliferation treaty and developed nuclear weapons anyway. The treaty deterred only those who weren’t very motivated anyway. And the reason why the US and Russia didn’t nuke each other has nothing to do with international treaties (see point II).In practice, powerful countries disregard international law whenever they want. A stark example of this is the Budapest Memorandum: in 1994, Ukraine surrendered all its nuclear warheads in exchange for written sovereignty guarantees from Russia, the US, and the UK. Russia annexed a part of Ukraine in 2014, and the international community expressed concern. Russia launched a full-scale invasion in 2022, and the first thing the international community did was block the bank cards of anti-war citizens fleeing Russia. No military intervention ever materialized. Putin is doing just fine.There are no stable enforcement mechanisms to address violations of international law. This is not a world of parties engaging in good-faith negotiations. It is a world in which Putin, Xi, and Trump treat international commitments as an empty pretext to show off in front of the cameras.II. The AI race is perceived as asymmetrical, unlike nuclear MADThe proposed AI treaty is compared to nuclear non-proliferation, but the underlying incentive structures in these two cases differ radically.As Eliezer noted, neither Soviet nor American leadership expected to have a good day if an actual nuclear war happened. They understood that a first strike wouldn’t prevent devastating retaliation. Carl Sagan and colleagues reinforced these fears with their nuclear winter research, arguing that even a limited nuclear exchange could trigger catastrophic global cooling. The Soviets refrained from launching a first strike not out of adherence to international treaties. They did so because they genuinely perceived the situation as lose-lose.However, the AI race appears to have a different payoff structure, at least as most people perceive it. If you develop ASI first, you potentially win decisively, preventing retaliation altogether. This creates immense pressure to defect.III. There is virtually zero possibility of consensus on AI risk, unlike nuclear weaponsTo be fair, the win-lose perception described above is probably wrong. In most cases, a misaligned ASI is catastrophic regardless of who builds it first. So the true payoff structure is likely to be lose-lose, just like nuclear war. But that doesn’t matter for treaty prospects, because behavior is driven by perceived payoffs, not actual ones.Eliezer’s treaty seems to require that everyone becomes so terrified of ASI that only a madman would violate it. Is that realistic? I’d say that if there emerges a scientific consensus on this topic, it’ll be a great first step towards such a world. Eliezer writes that a few hundred computer scientists, Nobel laureates, and others have called AI an extinction risk. Unfortunately, many others have disagreed, and the public debate is nowhere near settled. The situation is more analogous to the early climate change debates than to nuclear weapons. And even as scientific consensus on climate did eventually appear, most countries have done little of substance in response, because their incentives to ignore the consensus outweigh the perceived risk.Without a real consensus on the AI risk, the perceived payoff will continue to be enormous compared to the perceived risk. Trying to build a working AI ban on this foundation means a lot of wasted time and effort.At least with climate change, evidence eventually accumulated. With AI risk, by the time the evidence arrives, it will be too late to act on it.IV. The proposed enforcers have a demonstrated track record of not enforcing thingsFor a treaty to function, its enforcement has to be credible, i.e., the adversaries have to believe that violations will actually trigger consequences, i.e., the proposed airstrikes. If the US is seen as unwilling to follow through on stated commitments, the treaty is not going to be taken seriously[1].Regardless of the administration or country in question[2], modern international politics seems to be characterized by a persistent pattern of maximum stated commitment followed by a face-saving partial retreat declared as success. Viewed charitably, this pattern reflects the objective difficulty of costly enforcement against nuclear-armed adversaries, which is exactly what an AI treaty would require overcoming, repeatedly.V. GPU control is not analogous to nuclear material controlThe treaty’s proposed mechanism is control of high-end GPU clusters. This is much worse than nuclear material control in terms of ease and reliability.Weapons-grade uranium and plutonium are physically rare and require large and specialized industrial infrastructure (centrifuge cascades emit distinctive mechanical vibrations, enrichment facilities have identifiable thermal and radiological signatures), and can be monitored with a relatively small number of inspection sites. And one can’t simply find a less conspicuous way to create nuclear weapons.For AI training, GPUs are the current bottleneck, where the key word is “current”. Training frontier models on CPUs is much slower per chip, but CPU clusters are much more accessible, geographically diffuse, and unmonitorable at scale. A country like Russia (with a functioning math education system that produces new research talent every year) could plausibly distribute training across tens of thousands of ordinary servers in many locations. Unlike uranium centrifuges, small server clusters emit no detectable signatures from orbit. More importantly, algorithmic advancements have repeatedly reduced computational requirements.[3] The one guaranteed result of the proposed treaty is a flowering of new creative, unmonitorable alternatives to the current training methods.VI. A flawed treaty is not better than nothingThe argument that even an imperfect treaty that buys two years is better than no treaty sounds reasonable in the abstract. Without a treaty, the AI race is unconstrained. However, the flaws in any such treaty will systematically favor power-hungry authoritarian countries, which automatically increases the odds of the worst possible outcome.Which states usually comply with international agreements they find costly and which don’t? The Soviet bioweapons program, Biopreparat, continued in flagrant violation of the Biological Weapons Convention from 1975 until the early 1990s. The program employed tens of thousands of people and remained undetected by Western intelligence for most of that period.One almost certain consequence of the treaty is that the most risk-aware AI lab currently working at the frontier will stop its capabilities research. Another is that a certain former KGB officer will be jumping for joy in his secret underground bunker, not believing his luck.If a flawed treaty means that Anthropic pauses and authoritarian programs continue, then the treaty is truly worse than nothing. A quick death from a genuinely misaligned ASI, built by anyone, is a terrible outcome. But there are things much worse than death, and one can only hope that an ASI trained under the supervision of professional torturers turns out misaligned and kills us all quickly enough.So is there a better way?There is one kind of action that seems valuable in the absence of a good solution: extending the scope of the search effort. Right now it seems that our search should be wider, more desperate, and at a larger scale than it currently is. For example, we could:ask billionaires and governments to issue a huge number of grants to people willing to work on this problem so that millions of smart people could stop going to their pointless jobs and try to solve this,encourage everyone already involved in AI alignment research to crowdsource tractable sub-problems to the wider public,encourage Anthropic to identify promising researchers among Claude users (Claude interacts with millions of people daily and could, with appropriate design, find users who show unusual cognitive skills).In any case, the absence of a good solution is not an argument for a bad one. We need to look harder.^The recent record is not inspiring any confidence.^See also China’s final warning.^Recent work on BitNet and Mixture of Experts has already demonstrated that very capable LLMs can be trained on hardware below the H100 tier.Discuss Read More
International Law Cannot Prevent Extinction Either
The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass and write a whole-assed post.I agree with Eliezer’s main thesis that individual violence against AI researchers is both morally wrong and strategically stupid. Where I disagree is with the claim that international law can prevent extinction. It can’t, for the following reasons.I. International law is largely a fiction (especially when interests diverge sharply)The analogy with nuclear weapons is a poor one. North Korea signed the nuclear non-proliferation treaty and developed nuclear weapons anyway. The treaty deterred only those who weren’t very motivated anyway. And the reason why the US and Russia didn’t nuke each other has nothing to do with international treaties (see point II).In practice, powerful countries disregard international law whenever they want. A stark example of this is the Budapest Memorandum: in 1994, Ukraine surrendered all its nuclear warheads in exchange for written sovereignty guarantees from Russia, the US, and the UK. Russia annexed a part of Ukraine in 2014, and the international community expressed concern. Russia launched a full-scale invasion in 2022, and the first thing the international community did was block the bank cards of anti-war citizens fleeing Russia. No military intervention ever materialized. Putin is doing just fine.There are no stable enforcement mechanisms to address violations of international law. This is not a world of parties engaging in good-faith negotiations. It is a world in which Putin, Xi, and Trump treat international commitments as an empty pretext to show off in front of the cameras.II. The AI race is perceived as asymmetrical, unlike nuclear MADThe proposed AI treaty is compared to nuclear non-proliferation, but the underlying incentive structures in these two cases differ radically.As Eliezer noted, neither Soviet nor American leadership expected to have a good day if an actual nuclear war happened. They understood that a first strike wouldn’t prevent devastating retaliation. Carl Sagan and colleagues reinforced these fears with their nuclear winter research, arguing that even a limited nuclear exchange could trigger catastrophic global cooling. The Soviets refrained from launching a first strike not out of adherence to international treaties. They did so because they genuinely perceived the situation as lose-lose.However, the AI race appears to have a different payoff structure, at least as most people perceive it. If you develop ASI first, you potentially win decisively, preventing retaliation altogether. This creates immense pressure to defect.III. There is virtually zero possibility of consensus on AI risk, unlike nuclear weaponsTo be fair, the win-lose perception described above is probably wrong. In most cases, a misaligned ASI is catastrophic regardless of who builds it first. So the true payoff structure is likely to be lose-lose, just like nuclear war. But that doesn’t matter for treaty prospects, because behavior is driven by perceived payoffs, not actual ones.Eliezer’s treaty seems to require that everyone becomes so terrified of ASI that only a madman would violate it. Is that realistic? I’d say that if there emerges a scientific consensus on this topic, it’ll be a great first step towards such a world. Eliezer writes that a few hundred computer scientists, Nobel laureates, and others have called AI an extinction risk. Unfortunately, many others have disagreed, and the public debate is nowhere near settled. The situation is more analogous to the early climate change debates than to nuclear weapons. And even as scientific consensus on climate did eventually appear, most countries have done little of substance in response, because their incentives to ignore the consensus outweigh the perceived risk.Without a real consensus on the AI risk, the perceived payoff will continue to be enormous compared to the perceived risk. Trying to build a working AI ban on this foundation means a lot of wasted time and effort.At least with climate change, evidence eventually accumulated. With AI risk, by the time the evidence arrives, it will be too late to act on it.IV. The proposed enforcers have a demonstrated track record of not enforcing thingsFor a treaty to function, its enforcement has to be credible, i.e., the adversaries have to believe that violations will actually trigger consequences, i.e., the proposed airstrikes. If the US is seen as unwilling to follow through on stated commitments, the treaty is not going to be taken seriously[1].Regardless of the administration or country in question[2], modern international politics seems to be characterized by a persistent pattern of maximum stated commitment followed by a face-saving partial retreat declared as success. Viewed charitably, this pattern reflects the objective difficulty of costly enforcement against nuclear-armed adversaries, which is exactly what an AI treaty would require overcoming, repeatedly.V. GPU control is not analogous to nuclear material controlThe treaty’s proposed mechanism is control of high-end GPU clusters. This is much worse than nuclear material control in terms of ease and reliability.Weapons-grade uranium and plutonium are physically rare and require large and specialized industrial infrastructure (centrifuge cascades emit distinctive mechanical vibrations, enrichment facilities have identifiable thermal and radiological signatures), and can be monitored with a relatively small number of inspection sites. And one can’t simply find a less conspicuous way to create nuclear weapons.For AI training, GPUs are the current bottleneck, where the key word is “current”. Training frontier models on CPUs is much slower per chip, but CPU clusters are much more accessible, geographically diffuse, and unmonitorable at scale. A country like Russia (with a functioning math education system that produces new research talent every year) could plausibly distribute training across tens of thousands of ordinary servers in many locations. Unlike uranium centrifuges, small server clusters emit no detectable signatures from orbit. More importantly, algorithmic advancements have repeatedly reduced computational requirements.[3] The one guaranteed result of the proposed treaty is a flowering of new creative, unmonitorable alternatives to the current training methods.VI. A flawed treaty is not better than nothingThe argument that even an imperfect treaty that buys two years is better than no treaty sounds reasonable in the abstract. Without a treaty, the AI race is unconstrained. However, the flaws in any such treaty will systematically favor power-hungry authoritarian countries, which automatically increases the odds of the worst possible outcome.Which states usually comply with international agreements they find costly and which don’t? The Soviet bioweapons program, Biopreparat, continued in flagrant violation of the Biological Weapons Convention from 1975 until the early 1990s. The program employed tens of thousands of people and remained undetected by Western intelligence for most of that period.One almost certain consequence of the treaty is that the most risk-aware AI lab currently working at the frontier will stop its capabilities research. Another is that a certain former KGB officer will be jumping for joy in his secret underground bunker, not believing his luck.If a flawed treaty means that Anthropic pauses and authoritarian programs continue, then the treaty is truly worse than nothing. A quick death from a genuinely misaligned ASI, built by anyone, is a terrible outcome. But there are things much worse than death, and one can only hope that an ASI trained under the supervision of professional torturers turns out misaligned and kills us all quickly enough.So is there a better way?There is one kind of action that seems valuable in the absence of a good solution: extending the scope of the search effort. Right now it seems that our search should be wider, more desperate, and at a larger scale than it currently is. For example, we could:ask billionaires and governments to issue a huge number of grants to people willing to work on this problem so that millions of smart people could stop going to their pointless jobs and try to solve this,encourage everyone already involved in AI alignment research to crowdsource tractable sub-problems to the wider public,encourage Anthropic to identify promising researchers among Claude users (Claude interacts with millions of people daily and could, with appropriate design, find users who show unusual cognitive skills).In any case, the absence of a good solution is not an argument for a bad one. We need to look harder.^The recent record is not inspiring any confidence.^See also China’s final warning.^Recent work on BitNet and Mixture of Experts has already demonstrated that very capable LLMs can be trained on hardware below the H100 tier.Discuss Read More
