Published on February 17, 2026 7:51 PM GMTEpistemic status: Speculation on agent foundations research culture (which I am pretty deeply engaged with) and whether “we are confused about agency” which I am not sure about. I will take for granted that this is a common refrain, which should be familiar to anyone who is part of the relevant scene. The phrase “We are confused about agency,” often with variations such as “way too,” “deeply,” or “dangerously” is a common membership signal for a certain AI safety research culture. Roughly speaking, this is the culture of the agent foundations research program that accreted around MIRI.[1] The phrase is usually supplied as an argument for delaying the development of A.I. until certain mathematical research (particularly in learning/decision theory) has been carried out. I find the phrase uncomfortable on various levels.As Cultural SignalSince I claim that the phrase (apart from its literal meaning) functions as an in-group signal, it is natural to wonder what “we” refers to here. I believe the intention is “we = everyone.” I think that it requires a pretty serious level of scholarship to confidently claim that everyone is confused about agency. One general frustration I have with rationalist culture is the loner/outside attitude that emphasizes cleverness over scholarship, and generally underestimates the sometimes-slow but cumulative progress of academia.[2] I will discuss in later sections whether the phrase is true under the (typically intended?) “we = everyone” interpretation, which is treated as the default outside of this section. Another natural interpretation is “we = humanity,” which is substantially different because humanity is frequently confused about something that many individuals (and often whole professions) understand clearly. This would suggest drastically different interventions (e.g. education, communication). Interestingly, MIRI is now focused on such interventions. Arguably, MIRI is sufficiently less confused about agency than humanity to at least recognize that no one is prepared to build artificial superintelligence (ASI) safely. I think that this meaning usually is not intended, because the phrase is usually offered to support the relevance of some particular research agenda, not a communication effort. Alternatively, “we” could mean the agent foundations community, or just “me,” which would be a narrower but not necessarily more correct claim. Ironically, I think that something like the opposite of this interpretation of the phrase is often load-bearing: “The frontier labs, as opposed to us, are too confused about agency to be trusted with ASI.”Confused in what Sense?The word choice “confused” is part of the in-group signal, recalling Noticing Confusion.Is it really appropriate here? I feel that it does not quite fit. Confusion goes beyond alternative word choices such as “uncertainty” or “ignorance.” It suggests that we do not even have the right concepts to begin talking about agency (which is roughly what John Wentworth argues here).I think this is not only an overstatement, but probably (at ~65% confidence) just an incorrect description of our epistemic state. It is an overstatement because there is actually an extensive literature on agency from various perspectives (statistics, decision theory, economics, game theory, artificial intelligence, learning theory, algorithmic probability, etc.) which share many common powerful concepts and principles (such as probability theory). Though it is very difficult for one individual to study each of these areas in enough depth to see the connections clearly (and I don’t claim to have achieved this yet), it is possible to do this, and the existing knowledge seems very likely sufficient to explicate (most of) the necessary concepts. It is hard for me to view this epistemic state as per-paradigmatic. At least, it seems very hard to rule out that “agency” is relatively intricate and multi-faceted, rather than “confusing.”My stronger claim is that confusion is the wrong description not only in degree but in kind. It seems that “confusion about agency” is like “confusion about (biological) life,” and this is not by accident. There are many underlying principles of life (such as Darwinian evolution). There are also many commonalities between (most?) forms of life, such as carrying genetic information encoded as DNA. There are common limitations on what life forms can achieve from thermodynamics. But there are also many diverse forms of life which use an array of distinct biological mechanisms, some more confusing (to us) than others. I think the same is true of agency; it appears in various forms, through various interesting mechanisms, which tend to solve common problems in similar ways, while constrained by certain hard limitations. For more details, see my “meta-theory of rationality” sequence.Confused about what?While I would not say that we are confused about agency, I think we have a lot of specific confusions about certain aspects of agency. It is best to narrowly characterize which ones we are talking about. If the goal of agent foundations research is to solve AI safety, it is best to focus on the confusions that seem relevant to that effort.[3]For example, a theory of communication and concepts may aid value alignment and interpretability. It is easier to design a powerful but opaque learning algorithm than it is to design a powerful learning algorithm with a knowledge base we can actually read. I think we are still confused about exactly what that would even mean, and even whether it is possible (and in what sense).From a decision-theoretic standpoint, corrigibility and other forms of unambitiousness also seem difficult and maybe even confusing, but essential to ASI going well in practice. What does it mean for one system to reliably allow correction by another?In fact, all of these topics seem to fall under a sort of general category that Abram Demski might call “understanding trust.” Roughly speaking, communication is about our trust in an A.I. system, and corrigibility is about its trust in us. The precise nature of this connection, in itself, also seems confusing (to me).I think that experienced agent foundations researchers have already internalized these lessons, and the established agendas I am aware of pretty much all carve off some particular confusion about agency which seems relevant to safety.[4] But the overall discourse should be improved. ^Gretta Duleba has described it as the “The Alignment-Is-Hard Cluster,” but I am pretty sure that the phrase “We are confused about agency” is better Bayesian evidence for membership than “Alignment is hard.” ^Dan Murfet impressed this on me. ^Of course following curiosity is a useful local research strategy. But this line of argument frequently justifies cope.^Vanessa Kosoy and Alexander Appel come to mind as exceptions in the sense that they seem to be attacking overall confusion about agency pretty much directly. But they’re very good at it, and I think that they (in particular) should keep doing what they are doing. The correct number of researchers taking this path is not zero. Discuss Read More
“We are confused about agency”
Published on February 17, 2026 7:51 PM GMTEpistemic status: Speculation on agent foundations research culture (which I am pretty deeply engaged with) and whether “we are confused about agency” which I am not sure about. I will take for granted that this is a common refrain, which should be familiar to anyone who is part of the relevant scene. The phrase “We are confused about agency,” often with variations such as “way too,” “deeply,” or “dangerously” is a common membership signal for a certain AI safety research culture. Roughly speaking, this is the culture of the agent foundations research program that accreted around MIRI.[1] The phrase is usually supplied as an argument for delaying the development of A.I. until certain mathematical research (particularly in learning/decision theory) has been carried out. I find the phrase uncomfortable on various levels.As Cultural SignalSince I claim that the phrase (apart from its literal meaning) functions as an in-group signal, it is natural to wonder what “we” refers to here. I believe the intention is “we = everyone.” I think that it requires a pretty serious level of scholarship to confidently claim that everyone is confused about agency. One general frustration I have with rationalist culture is the loner/outside attitude that emphasizes cleverness over scholarship, and generally underestimates the sometimes-slow but cumulative progress of academia.[2] I will discuss in later sections whether the phrase is true under the (typically intended?) “we = everyone” interpretation, which is treated as the default outside of this section. Another natural interpretation is “we = humanity,” which is substantially different because humanity is frequently confused about something that many individuals (and often whole professions) understand clearly. This would suggest drastically different interventions (e.g. education, communication). Interestingly, MIRI is now focused on such interventions. Arguably, MIRI is sufficiently less confused about agency than humanity to at least recognize that no one is prepared to build artificial superintelligence (ASI) safely. I think that this meaning usually is not intended, because the phrase is usually offered to support the relevance of some particular research agenda, not a communication effort. Alternatively, “we” could mean the agent foundations community, or just “me,” which would be a narrower but not necessarily more correct claim. Ironically, I think that something like the opposite of this interpretation of the phrase is often load-bearing: “The frontier labs, as opposed to us, are too confused about agency to be trusted with ASI.”Confused in what Sense?The word choice “confused” is part of the in-group signal, recalling Noticing Confusion.Is it really appropriate here? I feel that it does not quite fit. Confusion goes beyond alternative word choices such as “uncertainty” or “ignorance.” It suggests that we do not even have the right concepts to begin talking about agency (which is roughly what John Wentworth argues here).I think this is not only an overstatement, but probably (at ~65% confidence) just an incorrect description of our epistemic state. It is an overstatement because there is actually an extensive literature on agency from various perspectives (statistics, decision theory, economics, game theory, artificial intelligence, learning theory, algorithmic probability, etc.) which share many common powerful concepts and principles (such as probability theory). Though it is very difficult for one individual to study each of these areas in enough depth to see the connections clearly (and I don’t claim to have achieved this yet), it is possible to do this, and the existing knowledge seems very likely sufficient to explicate (most of) the necessary concepts. It is hard for me to view this epistemic state as per-paradigmatic. At least, it seems very hard to rule out that “agency” is relatively intricate and multi-faceted, rather than “confusing.”My stronger claim is that confusion is the wrong description not only in degree but in kind. It seems that “confusion about agency” is like “confusion about (biological) life,” and this is not by accident. There are many underlying principles of life (such as Darwinian evolution). There are also many commonalities between (most?) forms of life, such as carrying genetic information encoded as DNA. There are common limitations on what life forms can achieve from thermodynamics. But there are also many diverse forms of life which use an array of distinct biological mechanisms, some more confusing (to us) than others. I think the same is true of agency; it appears in various forms, through various interesting mechanisms, which tend to solve common problems in similar ways, while constrained by certain hard limitations. For more details, see my “meta-theory of rationality” sequence.Confused about what?While I would not say that we are confused about agency, I think we have a lot of specific confusions about certain aspects of agency. It is best to narrowly characterize which ones we are talking about. If the goal of agent foundations research is to solve AI safety, it is best to focus on the confusions that seem relevant to that effort.[3]For example, a theory of communication and concepts may aid value alignment and interpretability. It is easier to design a powerful but opaque learning algorithm than it is to design a powerful learning algorithm with a knowledge base we can actually read. I think we are still confused about exactly what that would even mean, and even whether it is possible (and in what sense).From a decision-theoretic standpoint, corrigibility and other forms of unambitiousness also seem difficult and maybe even confusing, but essential to ASI going well in practice. What does it mean for one system to reliably allow correction by another?In fact, all of these topics seem to fall under a sort of general category that Abram Demski might call “understanding trust.” Roughly speaking, communication is about our trust in an A.I. system, and corrigibility is about its trust in us. The precise nature of this connection, in itself, also seems confusing (to me).I think that experienced agent foundations researchers have already internalized these lessons, and the established agendas I am aware of pretty much all carve off some particular confusion about agency which seems relevant to safety.[4] But the overall discourse should be improved. ^Gretta Duleba has described it as the “The Alignment-Is-Hard Cluster,” but I am pretty sure that the phrase “We are confused about agency” is better Bayesian evidence for membership than “Alignment is hard.” ^Dan Murfet impressed this on me. ^Of course following curiosity is a useful local research strategy. But this line of argument frequently justifies cope.^Vanessa Kosoy and Alexander Appel come to mind as exceptions in the sense that they seem to be attacking overall confusion about agency pretty much directly. But they’re very good at it, and I think that they (in particular) should keep doing what they are doing. The correct number of researchers taking this path is not zero. Discuss Read More

