This post was written as part of research done at MATS 9.0 under the mentorship of Richard Ngo.You can conceptualise Embedded agents as inducing a partition[1] of the world into “the agent”, “the external world”, and the dynamics that mediate their interaction; the dynamics include observations and actions. The agent has beliefs, which can be thought of as a generative model of the world. This contains a generative self-model of “the agent” and its relationship to the world through intermediate dynamics[2].An agent’s self-model contains probable events that are statistically dependent on her actions. They are likely, but only if she acts to make them happen. These events are her goals.^See Demski 2025 and Critch 2022 for mathematical treatments of partitions^This model may or may not be interpretable.Discuss Read More
An Informal Definition of Goals for Embedded Agents
This post was written as part of research done at MATS 9.0 under the mentorship of Richard Ngo.You can conceptualise Embedded agents as inducing a partition[1] of the world into “the agent”, “the external world”, and the dynamics that mediate their interaction; the dynamics include observations and actions. The agent has beliefs, which can be thought of as a generative model of the world. This contains a generative self-model of “the agent” and its relationship to the world through intermediate dynamics[2].An agent’s self-model contains probable events that are statistically dependent on her actions. They are likely, but only if she acts to make them happen. These events are her goals.^See Demski 2025 and Critch 2022 for mathematical treatments of partitions^This model may or may not be interpretable.Discuss Read More