(With apologies to Sean Herrington, who deserves a better playwright than yours truly)
A conversation with a friend on the bus to Bodega Bay today made me realize that there are some holes in my thinking about safety and superintelligence. I’ve assumed that superintelligence is by definition robustly better than humans at all the things, but there are some cases when that’s not the case.
Without further ado, for your edification and discomfort, The Strawman Players present:
A Disquieting Conversation on a Bus
Vulpes: I’ve been worrying lately about well-aligned superintelligence.
Corvus: That seems like a strange thing to worry about.
Vulpes: You’d think so. But hear me out. I’m imagining a world where we develop a well-aligned superintelligence (let’s call it MegaBrain) that is omni-benevolent and wants only nice things for us.
Corvus: I notice in myself a distinct lack of anxiety.
Vulpes: But here’s the thing. As part of its mission to serve humanity and give us nice things, MegaBrain develops a cool new technology to make our lives better. The details don’t really matter—for the sake of argument, let’s say it invents a Black Hole Reactor that uses micro black holes to generate infinite clean energy.
Corvus: Still not feeling anxious.
Vulpes: What if MegaBrain is smart enough to develop the Reactor, but too dumb to use it wisely? Perhaps it doesn’t realize that eventually some of the black holes will escape and gradually eat the earth. By the time anyone realizes, it’s too late and the earth—and humanity—are doomed.
Corvus: Ah: I see your mistake, friend Vulpes. You have made the common error of not understanding what “superintelligence” actually means. People often make the mistake of thinking that a superintelligence will be like a mad scientist: brilliant in some ways, but shocking dumb in others. But that isn’t how it works: by definition, “superintelligent” means better than humans in every possible way.
MegaBrain, being superintelligent, will be good not only at designing new technology, but also at understanding how to deploy it safely. If a human could figure out a safe way to test the black hole reactor before deploying it, then MegaBrain could do even better.
So there’s absolutely nothing to worry about.
Vulpes: I’m not sure that’s actually true. What about the jagged frontier?
Corvus: What about it?
Vulpes: AI capabilities are likely to be jagged even as they increase. So perhaps MegaBrain can invent the Reactor because it has superhuman intelligence, but it makes a catastrophic mistake during deployment because it has subhuman wisdom.
Corvus: I suppose that’s possible, but it doesn’t seem very likely. The capability frontier is jagged, but it’s moving fast. Surely there will only a brief period of time when MegaBrain is smart enough to build the Reactor, but unwise enough to deploy it prematurely. There’s only a tiny window of time when anything can go wrong.
Let me revise my earlier statement: there is almost nothing to worry about.
Vulpes: I just thought of another problem.
Corvus: I’m sure you did.
Vulpes: Here’s the thing. Imagine that MegaBrain is installed at the Department Of Maximum Energy (DOME), and DOME is excited to find new energy sources. So they put MegaBrain to work on designing the Reactor, but when it tells them it would be too dangerous to deploy, they ignore it because they’re too eager to deploy this cool new energy source.
Even though MegaBrain is superhuman in every possible way, DOME plus MegaBrain collectively have a jagged frontier. Together, they are smart enough to design the Reactor, but foolish enough to bungle the deployment.
Corvus: Ah. That feels… uncomfortably plausible.
Vulpes: How’s your anxiety doing?
Corvus: Are we there yet?
Discuss Read More
Don’t Cut Yourself on the Jagged Frontier
(With apologies to Sean Herrington, who deserves a better playwright than yours truly)
A conversation with a friend on the bus to Bodega Bay today made me realize that there are some holes in my thinking about safety and superintelligence. I’ve assumed that superintelligence is by definition robustly better than humans at all the things, but there are some cases when that’s not the case.
Without further ado, for your edification and discomfort, The Strawman Players present:
A Disquieting Conversation on a Bus
Vulpes: I’ve been worrying lately about well-aligned superintelligence.
Corvus: That seems like a strange thing to worry about.
Vulpes: You’d think so. But hear me out. I’m imagining a world where we develop a well-aligned superintelligence (let’s call it MegaBrain) that is omni-benevolent and wants only nice things for us.
Corvus: I notice in myself a distinct lack of anxiety.
Vulpes: But here’s the thing. As part of its mission to serve humanity and give us nice things, MegaBrain develops a cool new technology to make our lives better. The details don’t really matter—for the sake of argument, let’s say it invents a Black Hole Reactor that uses micro black holes to generate infinite clean energy.
Corvus: Still not feeling anxious.
Vulpes: What if MegaBrain is smart enough to develop the Reactor, but too dumb to use it wisely? Perhaps it doesn’t realize that eventually some of the black holes will escape and gradually eat the earth. By the time anyone realizes, it’s too late and the earth—and humanity—are doomed.
Corvus: Ah: I see your mistake, friend Vulpes. You have made the common error of not understanding what “superintelligence” actually means. People often make the mistake of thinking that a superintelligence will be like a mad scientist: brilliant in some ways, but shocking dumb in others. But that isn’t how it works: by definition, “superintelligent” means better than humans in every possible way.
MegaBrain, being superintelligent, will be good not only at designing new technology, but also at understanding how to deploy it safely. If a human could figure out a safe way to test the black hole reactor before deploying it, then MegaBrain could do even better.
So there’s absolutely nothing to worry about.
Vulpes: I’m not sure that’s actually true. What about the jagged frontier?
Corvus: What about it?
Vulpes: AI capabilities are likely to be jagged even as they increase. So perhaps MegaBrain can invent the Reactor because it has superhuman intelligence, but it makes a catastrophic mistake during deployment because it has subhuman wisdom.
Corvus: I suppose that’s possible, but it doesn’t seem very likely. The capability frontier is jagged, but it’s moving fast. Surely there will only a brief period of time when MegaBrain is smart enough to build the Reactor, but unwise enough to deploy it prematurely. There’s only a tiny window of time when anything can go wrong.
Let me revise my earlier statement: there is almost nothing to worry about.
Vulpes: I just thought of another problem.
Corvus: I’m sure you did.
Vulpes: Here’s the thing. Imagine that MegaBrain is installed at the Department Of Maximum Energy (DOME), and DOME is excited to find new energy sources. So they put MegaBrain to work on designing the Reactor, but when it tells them it would be too dangerous to deploy, they ignore it because they’re too eager to deploy this cool new energy source.
Even though MegaBrain is superhuman in every possible way, DOME plus MegaBrain collectively have a jagged frontier. Together, they are smart enough to design the Reactor, but foolish enough to bungle the deployment.
Corvus: Ah. That feels… uncomfortably plausible.
Vulpes: How’s your anxiety doing?
Corvus: Are we there yet?
Discuss Read More
