This is mostly a specific case of what Buck said here, but people keep doing it and I’m on a blogging streak so you guys have to hear it again.There’s an argument I’ve heard around AI X-risk prevention which kinda goes like “We’ve tried [simple plan] and we’re still doomed. Therefore we have to try [crazy plan] instead!”. This is, in fact bad. I’ll give a couple of examples and then get into why.Non-disruptive protestsI’m a big fan of non-disruptive protests about AI safety. I’m much less convinced by disruptive stuff. I once had a discussion with a fellow which went something like this:Him: the protests you’ve done haven’t worked, you should do disruptive stuff.Me: there’s no reason to think that those protests would work any better than ours, and likely they’d be less effective for [reasons]Him: I don’t understand why you’re doubting a method whose efficacy is unknown (because we haven’t done it yet) but supporting a method which we know doesn’t work.Me: so the way I think of this is that we have some unknown number of victory points that we need to achieve, in order for humanity to survive, and we’re chugging along gaining victory points at a fairly slow but low-variance rate, and your suggestion is like gambling all of our points on a small chance of winning, which seems like it might work, but also you have a bunch of adverse selection effects like the unilateralist’s curse and optimism bias so actually you’re basically guaranteed to burn all the victory points for nothing.My rapid production of a very large volume of words did shut him up, but I don’t think it was a very useful discussion. I think the core difficulty is that he, in his gut, expected there to be a chance to save the world while I, in my gut, expected there to mostly just be opportunities to win marginal points and make the world look basically as grim as it ever was. I don’t think the fact that we’re still doomed is sufficient to prove that protesting was a bad plan, or isn’t a good thing to keep going.MIRI-Stuff I’m saying “MIRI-stuff” to mean the early agent foundations/decision theory/logical induction work that MIRI did to try and solve alignment. I’ve heard people say that this was a bad idea, and point to the fact that they didn’t succeed as evidence of this. I don’t think that’s fair. MIRI’s stuff has been extremely useful for my thinking. It’s true that MIRI didn’t solve alignment, but this seems to mostly be because alignment was extremely hard. I think MIRI-ish stuff is still one of the most important avenues for research.X Won’t Work, so YOften, people go a step worse and say that our continued doom is evidence for their own pet project. We saw that explicitly in the first case. You also see it between political and technical approaches to AI alignment. “We won’t get a pause, so you should do alignment” or “We wont’ get alignment, we should do control” or “We won’t get a technical solution, you should do activism.”In this case, people are falling into the no-maybe-yes fallacy, which is a quirk of the human brain: we tend to bucket events into “won’t happen, i.e. probability too low to be worth thinking about”, “might happen, i.e. probability intermediate, track both options” and “will happen, i.e. probability so high we don’t need to track the case where it doesn’t happen.” They squish one small probability (that the thing they don’t like works) into the first category. Then, by intuition that there should be some good plan, the second small probability (that their preferred plan works) has to remain in the “maybe” bucket.Of course, it probably is worth calculating which plan is most likely to succeed, but don’t use your intuitive yes-no-maybe trilemma machinery on each plan individually, and definitely don’t run it on just the plan you don’t like!◆◆◆◆◆|◆◆◆◇◇|◇◇◇◇◇◆◆◆◆◆|◆◆◇◇◇|◇◇◇◇◇Discuss Read More
“We’re Still Doomed” is Weak Evidence Against Any Particular Past Doom-Aversion Plan
This is mostly a specific case of what Buck said here, but people keep doing it and I’m on a blogging streak so you guys have to hear it again.There’s an argument I’ve heard around AI X-risk prevention which kinda goes like “We’ve tried [simple plan] and we’re still doomed. Therefore we have to try [crazy plan] instead!”. This is, in fact bad. I’ll give a couple of examples and then get into why.Non-disruptive protestsI’m a big fan of non-disruptive protests about AI safety. I’m much less convinced by disruptive stuff. I once had a discussion with a fellow which went something like this:Him: the protests you’ve done haven’t worked, you should do disruptive stuff.Me: there’s no reason to think that those protests would work any better than ours, and likely they’d be less effective for [reasons]Him: I don’t understand why you’re doubting a method whose efficacy is unknown (because we haven’t done it yet) but supporting a method which we know doesn’t work.Me: so the way I think of this is that we have some unknown number of victory points that we need to achieve, in order for humanity to survive, and we’re chugging along gaining victory points at a fairly slow but low-variance rate, and your suggestion is like gambling all of our points on a small chance of winning, which seems like it might work, but also you have a bunch of adverse selection effects like the unilateralist’s curse and optimism bias so actually you’re basically guaranteed to burn all the victory points for nothing.My rapid production of a very large volume of words did shut him up, but I don’t think it was a very useful discussion. I think the core difficulty is that he, in his gut, expected there to be a chance to save the world while I, in my gut, expected there to mostly just be opportunities to win marginal points and make the world look basically as grim as it ever was. I don’t think the fact that we’re still doomed is sufficient to prove that protesting was a bad plan, or isn’t a good thing to keep going.MIRI-Stuff I’m saying “MIRI-stuff” to mean the early agent foundations/decision theory/logical induction work that MIRI did to try and solve alignment. I’ve heard people say that this was a bad idea, and point to the fact that they didn’t succeed as evidence of this. I don’t think that’s fair. MIRI’s stuff has been extremely useful for my thinking. It’s true that MIRI didn’t solve alignment, but this seems to mostly be because alignment was extremely hard. I think MIRI-ish stuff is still one of the most important avenues for research.X Won’t Work, so YOften, people go a step worse and say that our continued doom is evidence for their own pet project. We saw that explicitly in the first case. You also see it between political and technical approaches to AI alignment. “We won’t get a pause, so you should do alignment” or “We wont’ get alignment, we should do control” or “We won’t get a technical solution, you should do activism.”In this case, people are falling into the no-maybe-yes fallacy, which is a quirk of the human brain: we tend to bucket events into “won’t happen, i.e. probability too low to be worth thinking about”, “might happen, i.e. probability intermediate, track both options” and “will happen, i.e. probability so high we don’t need to track the case where it doesn’t happen.” They squish one small probability (that the thing they don’t like works) into the first category. Then, by intuition that there should be some good plan, the second small probability (that their preferred plan works) has to remain in the “maybe” bucket.Of course, it probably is worth calculating which plan is most likely to succeed, but don’t use your intuitive yes-no-maybe trilemma machinery on each plan individually, and definitely don’t run it on just the plan you don’t like!◆◆◆◆◆|◆◆◆◇◇|◇◇◇◇◇◆◆◆◆◆|◆◆◇◇◇|◇◇◇◇◇Discuss Read More
