Opinion

Bridge Thinking and Wall Thinking

​There are a couple of frames I find useful when understanding why different people talk very differently about AI safety – the wall, and the bridge.A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the better. If you are adding a brick to the wall you are doing something good, regardless of the current state of the wall.A bridge requires a certain amount of investment. There’s not much use for half a bridge. Once the bridge crosses the lake, it can be improved – but until you get a working bridge, you have nothing.A solid example of wall thinking is the image in this thread by Chris Olah. Any approach around “eating marginal probability” involves a wall frame. Another example is the theory of change of the standards work I’ve done for Inspect Evals, which I would summarise as “Other fields like aviation and rocketry have solid safety standards and paradigms. We need to build this for evaluations – it’s the kind of thing that a mature AI safety field needs to have.” This theory doesn’t have a full story of how it helps save the world end-to-end, but it doesn’t have to, under the wall frame – it just has to be pointing in the right direction and be broadly helpful.A good example of bridge thinking is the MIRI approach – asking openly and outright for an international treaty. In my understanding, MIRI are not asking for the most they think they can get away with. They are asking for what they think is necessary to solve the problem, and believe anything less is insufficient. In lieu of p(doom), Eliezer asks “What is the minimum necessary and sufficient policy that you think would prevent extinction?” This is bridge thinking – we need to achieve a certain outcome X, and <X won’t do. Anything that has no chance of achieving X or greater is unhelpful at best and counterproductive at worst, And to figure out what X is needed, you need to have a solid idea of what your high level goal is from the start and how a given course of action gets you there.From the wall perspective, bridge thinkers are ignoring or denigrating important marginal or unglamorous work in favor of swinging for the fences. From the bridge perspective, wall thinkers are doing things that are not helpful and will end up rounding down to zero.I have found this a very useful way to understand why some people in AI safety are proposing very different ideas than me.Discuss ​Read More

​There are a couple of frames I find useful when understanding why different people talk very differently about AI safety – the wall, and the bridge.A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the better. If you are adding a brick to the wall you are doing something good, regardless of the current state of the wall.A bridge requires a certain amount of investment. There’s not much use for half a bridge. Once the bridge crosses the lake, it can be improved – but until you get a working bridge, you have nothing.A solid example of wall thinking is the image in this thread by Chris Olah. Any approach around “eating marginal probability” involves a wall frame. Another example is the theory of change of the standards work I’ve done for Inspect Evals, which I would summarise as “Other fields like aviation and rocketry have solid safety standards and paradigms. We need to build this for evaluations – it’s the kind of thing that a mature AI safety field needs to have.” This theory doesn’t have a full story of how it helps save the world end-to-end, but it doesn’t have to, under the wall frame – it just has to be pointing in the right direction and be broadly helpful.A good example of bridge thinking is the MIRI approach – asking openly and outright for an international treaty. In my understanding, MIRI are not asking for the most they think they can get away with. They are asking for what they think is necessary to solve the problem, and believe anything less is insufficient. In lieu of p(doom), Eliezer asks “What is the minimum necessary and sufficient policy that you think would prevent extinction?” This is bridge thinking – we need to achieve a certain outcome X, and <X won’t do. Anything that has no chance of achieving X or greater is unhelpful at best and counterproductive at worst, And to figure out what X is needed, you need to have a solid idea of what your high level goal is from the start and how a given course of action gets you there.From the wall perspective, bridge thinkers are ignoring or denigrating important marginal or unglamorous work in favor of swinging for the fences. From the bridge perspective, wall thinkers are doing things that are not helpful and will end up rounding down to zero.I have found this a very useful way to understand why some people in AI safety are proposing very different ideas than me.Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *