I was thinking about Habryka’s article on Putin’s CEV, but I am posting my response here, because the original article is already 3 weeks old.I am not sure how exactly a person’s CEV is defined. “If we knew everything and could self-modify” seems potentially sensitive to the precise chronological order of “realizing things” and “self-modification”.Like, imagine Hitler getting the godlike powers of knowledge and self-control. If he gets the perfect knowledge of economy, sociology and psychology first, he could go like: “Oh, now I realize that the things I blamed on the Jews are actually caused by something else. How embarrassing. No more anti-semitism, but I better erase everyone’s memory first.”But it is also possible that he gets the self-control first, and he realizes that there is such a thing as value drift, and thinks: “Oh my, this could accidentally make me more similar to the Jews. I better hardcode the Nazi ideals in myself immediately, and also give myself blond hair and blue eyes.” And using the superior knowledge, he hardcodes the Nazi values in himself so that they are reflectively stable and survive all updates.So, Hitler’s CEV seems to depend on the technical details, in which order he gets the new knowledge and the new skills. He could end up either CEV-nice or a CEV-monster.Seems like “knowledge first, self-modification next” is the preferred order, but that kinda assumes perfect rationality at the beginning. I mean, perfect knowledge without perfect rationality would probably be prone to confirmation bias and other biases. So we might want perfect rationality (or merely improved rationality) first, but making ourselves more rational is already in the realm of self-modification.Second, it seems to me that Habryka chooses between two models in the article: Either everyone is CEV-nice, or almost everyone is CEV-nice but a few people such as Putin are rare CEV-monsters. Then he concludes, in my opinion correctly given the premises, that Putin doesn’t seem to be that exceptional. Therefore, he is probably CEV-nice.(By “CEV-nice” I mean: given godlike powers of knowledge and self-modification, he will ultimately become a benevolent God. There may be a few atrocities in the process, but at some moment he will realize that there are no more threats, and therefore no strategic reasons to treat other people badly. And, getting the strategic reasons out of the way, there are basically no other reasons to hurt people. And by a “CEV-monster” I mean someome who will end up hurting people for non-strategic reasons even after feeling perfectly secure in their godlike powers.)If this is correct, then I simply reject the premise. I think that although most people are probably “CEV-good”, there are also quite many “CEV-monsters”, i.e. people who value suffering for the sake of suffering (of others). I don’t know how many, but as a very rough estimate, let’s say between 5% and 50%? Under that premise, it doesn’t seem that unlikely that Putin would happen to be one of them. (Or Trump, etc.) I would assume that monsters are over-represented in the positions of power, simply because on the way to the top there are many situations where people have to choose between hurting someone and losing an opportunity to gain more power, so the intrinsically nice are at a disadvantage.I would also add a third category that I will call “CEV-insane”, someone who after obtaining godlike powers would destroy everything that we value, even for themselves. For example, someone who believes that death gives meaning to life, and the intelligence is a source of misfortune, so he magically establishes a law that everyone will be mortal and of average intelligence. Or a Buddhist who believes that life is suffering, and there is no such thing as “self” anyway, and decides that the omnicide is the right way to go, no more dukkha. Or some eco-fanatic who decides that the homo sapiens or the intelligence itself is a problem, and must be eradicated. Or simply a person who does self-modification wrong, and destroys some of their essential human qualities, while retaining the ability to decide the fate of the universe. I think there are also enough people like this.I admit that these are very dark ideas, but looking around me, it seems like the world we live in is indeed quite dark. It’s not like people are born good or bad (although, as far as I know, e.g. psychopathy is hereditary to some degree), but more like we move towards some attractors that are self-reinforcing enough to keep us there even after the original forces are gone. Good people keep wishing they remained good, and might even self-modify towards more good, given the tools. But assholes who don’t give a fuck will see no reason to self-modify into someone who gives a fuck. (This is why a group CEV seems like a safer option, if there are some good people in the group, because the good people might choose the good for everyone else, while the assholes might decide that they don’t care either way as long as they are left alone.)Discuss Read More
Many individual CEVs are probably quite bad
I was thinking about Habryka’s article on Putin’s CEV, but I am posting my response here, because the original article is already 3 weeks old.I am not sure how exactly a person’s CEV is defined. “If we knew everything and could self-modify” seems potentially sensitive to the precise chronological order of “realizing things” and “self-modification”.Like, imagine Hitler getting the godlike powers of knowledge and self-control. If he gets the perfect knowledge of economy, sociology and psychology first, he could go like: “Oh, now I realize that the things I blamed on the Jews are actually caused by something else. How embarrassing. No more anti-semitism, but I better erase everyone’s memory first.”But it is also possible that he gets the self-control first, and he realizes that there is such a thing as value drift, and thinks: “Oh my, this could accidentally make me more similar to the Jews. I better hardcode the Nazi ideals in myself immediately, and also give myself blond hair and blue eyes.” And using the superior knowledge, he hardcodes the Nazi values in himself so that they are reflectively stable and survive all updates.So, Hitler’s CEV seems to depend on the technical details, in which order he gets the new knowledge and the new skills. He could end up either CEV-nice or a CEV-monster.Seems like “knowledge first, self-modification next” is the preferred order, but that kinda assumes perfect rationality at the beginning. I mean, perfect knowledge without perfect rationality would probably be prone to confirmation bias and other biases. So we might want perfect rationality (or merely improved rationality) first, but making ourselves more rational is already in the realm of self-modification.Second, it seems to me that Habryka chooses between two models in the article: Either everyone is CEV-nice, or almost everyone is CEV-nice but a few people such as Putin are rare CEV-monsters. Then he concludes, in my opinion correctly given the premises, that Putin doesn’t seem to be that exceptional. Therefore, he is probably CEV-nice.(By “CEV-nice” I mean: given godlike powers of knowledge and self-modification, he will ultimately become a benevolent God. There may be a few atrocities in the process, but at some moment he will realize that there are no more threats, and therefore no strategic reasons to treat other people badly. And, getting the strategic reasons out of the way, there are basically no other reasons to hurt people. And by a “CEV-monster” I mean someome who will end up hurting people for non-strategic reasons even after feeling perfectly secure in their godlike powers.)If this is correct, then I simply reject the premise. I think that although most people are probably “CEV-good”, there are also quite many “CEV-monsters”, i.e. people who value suffering for the sake of suffering (of others). I don’t know how many, but as a very rough estimate, let’s say between 5% and 50%? Under that premise, it doesn’t seem that unlikely that Putin would happen to be one of them. (Or Trump, etc.) I would assume that monsters are over-represented in the positions of power, simply because on the way to the top there are many situations where people have to choose between hurting someone and losing an opportunity to gain more power, so the intrinsically nice are at a disadvantage.I would also add a third category that I will call “CEV-insane”, someone who after obtaining godlike powers would destroy everything that we value, even for themselves. For example, someone who believes that death gives meaning to life, and the intelligence is a source of misfortune, so he magically establishes a law that everyone will be mortal and of average intelligence. Or a Buddhist who believes that life is suffering, and there is no such thing as “self” anyway, and decides that the omnicide is the right way to go, no more dukkha. Or some eco-fanatic who decides that the homo sapiens or the intelligence itself is a problem, and must be eradicated. Or simply a person who does self-modification wrong, and destroys some of their essential human qualities, while retaining the ability to decide the fate of the universe. I think there are also enough people like this.I admit that these are very dark ideas, but looking around me, it seems like the world we live in is indeed quite dark. It’s not like people are born good or bad (although, as far as I know, e.g. psychopathy is hereditary to some degree), but more like we move towards some attractors that are self-reinforcing enough to keep us there even after the original forces are gone. Good people keep wishing they remained good, and might even self-modify towards more good, given the tools. But assholes who don’t give a fuck will see no reason to self-modify into someone who gives a fuck. (This is why a group CEV seems like a safer option, if there are some good people in the group, because the good people might choose the good for everyone else, while the assholes might decide that they don’t care either way as long as they are left alone.)Discuss Read More
