Opinion

Realizability for Finite State Reactive Agents

​Published on February 15, 2026 11:38 AM GMTThis work was produced during the Dovetail Research Fellowship. I’m grateful to Alex, Alfred, Winter, and Santiago, and to everyone who provided feedback, and to the fellowship community for their discussions.AbstractWe study AI agents through the lens of finite-state transducers and string-to-string behaviors. Instead of treating automata purely as language recognizers, we model agents as systems that map observation histories to action histories while interacting with an environment online. We introduce Finite-State Reactive Agents (FSRAs) as deterministic, finite-memory, reactive policies, establishing their equivalence to standard Mealy and Moore machines. Our main result (Theorem 3.13) characterizes exactly which behaviors can be implemented with finite internal state: a deterministic behavior is FSRA-implementable if and only if it is length-preserving, prefix-preserving (causal), and has finite memory, & relate FSRAs to standard Mealy and Moore machines and identify when richer computational models are unavoidable.  We also introduce specifications (constraints on allowed agent behaviors) and formalize notions of weak enforceability, strong enforceability, and realizability. Using our characterization theorem, we prove strict separations between these notions (Theorem 3.15), demonstrating when finite-state agents can or cannot satisfy given specifications. IntroductionDeterministic Finite Automata (DFAs) are usually introduced as a basic model of computation: simple machines that read an input symbol by symbol and decide whether to accept or reject a string. In theoretical computer science, they are often treated as a warm-up before more powerful models such as Turing machines.In this work, we take a different perspective. A DFA can also be viewed as a minimal agent: it maintains a small internal state, receives observations from the environment, updates its state deterministically, and eventually produces an outcome. Despite their simplicity, DFAs already capture the core interaction loop shared by many AI systems: observe → update internal memory → act or evaluate.This observation motivates a shift from language recognition to agent behavior. We introduce the model of a Finite-State Reactive Agent (FSRA), which generalizes DFAs from passive evaluators to active, online systems that map streams of observations to streams of actions using finite internal memory. FSRAs formalize deterministic, reactive agents whose behavior is causal and prefix-preserving, and whose internal state evolves according to a fixed update rule (time-invariance). They let us ask precise questions about memory, state abstraction, observability, and goal satisfaction without the complications of learning (updating the policy from data), optimization (searching over policies or parameters to maximize an objective), or stochasticity (randomness in the agent’s actions or environment). These characterizations let us state simple, checkable conditions for when a specification is enforceable or realizable by a set of finite-state controllers.Outline. The rest of this post is organized as follows. Section 1 introduces the Finite-State Reactive Agent (FSRA) model and states its equivalence with standard Mealy and Moore machines. Section 2 develops formal properties of agent behaviors: determinism, length-preservation, causality (prefix-preservation), finite memory, reactivity, and time-invariance, modeled as properties of string-to-string functions. It concludes by introducing specifications (constraints on allowed behaviors) and three enforceability notions: weak, strong, and realizability. Section 3 presents our main results: Theorem 3.13 characterizes exactly which behaviors are FSRA-implementable (reactive + finite-memory), Corollary 3.14 shows that such behaviors induce realizable specifications, and Theorem 3.15 proves strict separations between the enforceability notions, demonstrating fundamental gaps between what finite-state agents can weakly satisfy versus fully realize. We conclude with worked examples including binary remainder computation (mod k), substring avoidance (no “11”), and count-with-ending constraints. Standard automata-theoretic background (DFAs, Mealy machines, Moore machines) and deferred proofs are collected in the Appendix.Finite State Reactive AgentMealy machines (Definition A.2 in the Appendix) are deterministic finite-state transducers that produce an output symbol synchronously with each input symbol.  This makes them a natural formal model for very simple online agents: the input alphabet corresponds to the agent’s observations, the finite state encodes the agent’s memory, and the per-step output is the agent’s immediate action.  To make this perspective explicit and to use language familiar to AI and alignment readers, we adopt the name Finite-State Reactive Agent (FSRA) for a Mealy machine interpreted as a deterministic, finite-memory, online policy.  The formal Definition 1.1 below gives the FSRA tuple and the precise causal / prefix-preserving behavior we require.Definition 1.1 (Finite State Reactive Agent – FSRA)A Finite-State Reactive Agent (FSRA) is a deterministic, finite-memory, online transducer that maps each input (observation) prefix to an output (action) prefix of the same length i.e. it emits exactly one action symbol for every observation symbol it consumes.Formally, an FSRA is a tuple , where is a finite set of states (the agent’s internal memory), is a finite observation alphabet (input), is a finite action alphabet (output), is the (total) state-update function, is the output (action) function, producing exactly one action per observation, is the start state.The FSRA implements a functionthat is length-preserving:and causal / prefix-preserving: the -th output symbol depends only on the first  input symbols, and for any prefix  and continuation ,  is a prefix of .Concretely, on input  starting from , define the state sequence with  given. The produced action sequence is, so .Remark (generalization). If you want to allow a variable number of action symbols per observation, simply let . That yields the Mealy/transducer formulation where one transition may emit a finite output string; the FSRA above is the special case that is strictly one action per observation.Calling this model an FSRA makes the intended interpretation explicit:   are observations,  are actions, and the pair  together specify a deterministic, finite-memory policy. An FSRA captures the simplest kind of online agent (these terms are discussed in detail in the next subsection): it is reactive (length- and prefix-preserving, emitting exactly one action per observation), has finite memory encoded by the finite state set Q, and is deterministic.This model is useful for reasoning about what can be implemented with finite memory, and for making clear where finite-state agents fail. When one wants to model policies with longer or variable delay between observation and action, or policies that maintain richer beliefs, the FSRA can be generalized (e.g.  or by increasing , but ultimately some behaviors require strictly more expressive models (e.g. pushdown automata, Turing Machines, or stochastic/learning agents).Proposition 1.2 (Equivalence with Mealy / Moore)Let  be a finite observation alphabet set and  a finite action alphabet set.Every FSRA is, by definition, a standard Mealy machine (with input alphabet , output alphabet , and single-symbol output per transition – Definition A.2′ in the Appendix). Conversely, every standard Mealy machine (whose output function produces exactly one symbol per input) is an FSRA. This equivalence is immediate from the definitions.For every FSRA  there exists an equivalent standard Moore machine (Definition A.3′ in the Appendix) computing the same length-preserving transduction , and conversely, every standard Moore machine can be converted into an equivalent FSRA.Brief justification. Part 1 is just a renaming of components. For Part 2, the FSRA-to-Moore direction requires splitting each state  into copies  – one for each output symbol a that  can produce, so that the output can be read from the state alone; this can increase the state count up to . The Moore-to-FSRA direction works by defining  where  is the Moore output function, moving the output from the arrived-at state to the transition that enters it.Formal Properties of Agent BehaviorsIn this section, we define several properties of general observation-to-action behaviors, modeled as string-to-string functions. We then identify which of these properties are satisfied by Finite-State Reactive Agents (FSRAs).Let  and  be finite alphabets, representing observations and actions respectively. We consider functionswhere  denotes the collection of nonempty subsets of . Such a function assigns to each finite observation history a set of admissible action histories.Later, we will specialize to deterministic behaviors, which are functions of the formExample (Behavior)Let . Define a behavior  as follows:In this example, the behavior is nondeterministic: after some observation histories, multiple action histories are permitted.Intuitively, a function  describes everything an agent is allowed to do, rather than what it will do. For each observation history,  lists the action histories that are consistent with some specification, constraint, or design goal. Deterministic agents correspond to the special case where exactly one action history is allowed at each observation history.It is useful to separate behaviors from agents. A behavior  specifies which action histories are acceptable after each observation history, without committing to a particular decision rule. An agent, by contrast, must choose a single action at each step.We also distinguish both from a specification: a specification P is a constraint on what an agent is allowed to do (formally, a relation ; see Definition 2.5), whereas a behavior is what the agent actually does. This three-way distinction: specification, behavior, agent, lets us ask precise questions. For example: “Can a simple agent satisfy this specification?” (i.e., does there exist an agent whose behavior stays within the allowed set?). In the formal-methods and control-theory literature, this is called enforceability: a specification is enforceable by a set of agents if at least one agent in that set produces only allowed behaviors. We can also ask the converse: “Can a set of agents collectively realize every behavior the specification allows?” These notions are formalized in Definition 2.11.Allowing  to be nondeterministic is intentional. Many natural specifications such as safety constraints, permissive policies, or sets of acceptable plans do not prescribe a unique action at every step. Instead, they describe a space of allowed behaviors. Definition 2.3 (Deterministic) A function  is deterministic if In this case, we may write  and identify  with the unique element of the singleton set. Determinism means that given any observation history, the action sequence is uniquely determined. Note that a deterministic behavior effectively is an agent’s policy, the distinction between “set of allowed action histories” and “committed choice” collapses when the allowed set is always a singleton. The behavior-vs-agent distinction is most useful in the nondeterministic case, where a behavior permits multiple options and the question becomes which agent (i.e., which deterministic selection) to deploy. Example (Deterministic)Let  and . DefineSo, the function returns  and  alternatively.Determinism is the simplest way to turn a permissive specification into an actual policy: once you pick a deterministic  you no longer have a set of allowed behaviors but a single thing the agent will actually do (to follow the behavior). This is useful when reasoning about concrete guarantees (e.g., “will this agent ever output bad?”) but remember determinism is a modeling choice, real systems are sometimes nondeterministic (stochastic) by design.Definition 2.4 (Controller and Controller Set) A controller is any concrete system (a finite automaton, a program, a circuit, a neural network, etc.) that determines a behaviorwhere  is the set of action histories the controller may produce after observation history o.  We deliberately leave “concrete system” informal: the only thing that matters formally is the induced behavior .A controller set  is a set of controllers; we write  and use the shorthand  for the mapping from controllers to their induced behaviors. (No additional structure on  is assumed, it is simply the collection of controllers under consideration, e.g., “all FSRAs” or “all 3-state Mealy machines.”)If each   is single valued we call the controller deterministic and identify Example (Controllers)Let  be the set containing two deterministic behavior/functions with:– .Then  is a controller set that can produce at least depending on which controller is selected.Definition 2.5 (Specification)Let  and  be finite alphabets, and letbe a specification (a relation between observation histories and allowed action histories).  For an observation history  write the fiber (i.e., the set of all action histories that the specification allows after ):A specification is not the same as a behavior. A behavior describes what a controller does; a specification describes what a controller is allowed to do. In particular,  can contain many action histories, it expresses a constraint (“any of these actions would be acceptable”) rather than a commitment (“this is the action I will take”). A single controller typically covers only some of the allowed pairs. The enforceability and realizability notions in Definition 2.11 make this gap precise: they ask whether controllers in a set  can, individually or collectively, cover what  allows. Let  be a set of controllers; each controller  induces a function (behavior)so that is the set of action sequences controller  produces on observation history . Define the relation implemented by  asObserve that .Think of the controller as the machine you could put on the robot: it has internal structure and a rule for mapping sensory input to motor outputs. The behavior ​ is the abstract trace of possibilities that a machine can generate. Distinguishing these levels is crucial: a specification can permit many possible action traces (multivalued), but only some will be reachable by your finite-state controllers. The phrase “is it enforceable?” asks whether the machines you can actually implement can realize at least one of the allowed traces at each point; “is it realizable?” asks whether the machines collectively capture exactly the specification.Definition 2.6 (Length Preserving)A function  is length-preserving if   For deterministic  this reduces to Example (Length-preserving)Let ,  and definei.e.   Then . Intuitively, a length-preserving function produces exactly as many output symbols as it consumes input symbols. This models an agent that emits one action per observation (or more generally, a fixed number of actions per observation). It’s an explicit modeling choice: many controllers (especially embedded or real-time ones) naturally fit this model.Definition 2.7 (Prefix Preserving/Causal)A function  is prefix-preserving (or causal) if For deterministic  this is equivalent to Causality captures the notion of online computation: the output produced after reading a prefix  must remain consistent (as a prefix) with the output produced after reading any extension . In other words, the agent commits to its early outputs and cannot retroactively change them when new observations arrive.Example (Prefix-preserving / causal)Let ,  and define  by the stepwise rule:on seeing  output ; on seeing  output .Formally, .Then  is a prefix of  for every .Definition 2.8 (Finite Memory)A deterministic function  has the finite-memory property if the equivalence relation  defined byhas finite index (i.e., finitely many equivalence classes).The relation  partitions observation prefixes into equivalence classes that cannot be distinguished by any future continuation. If this partition is finite, then the agent effectively needs only finite memory to encode its past: it suffices to remember which equivalence class the history belongs to. This is the string-function analogue of the Myhill-Nerode theorem for languages.The finite-memory condition is the formal version of “bounded internal state.” It says there are only finitely many distinct ways the future can unfold after different pasts, so keeping one of finitely many summaries is enough. This is exactly why finite automata are a natural model for resource-limited agents.Example (Finite Memory – Periodic every ) Fix . Let . Define deterministic  byi.e. the symbol emitted at step  depends only on the position  (every -th output is ).Claim.  has finite memory; the index of  is exactly .Proof. Define relation (R) on prefixes byThere are at most  classes under . If , then for every continuation  the positions in  where outputs equal 1 are the same (they are exactly those indices congruent to ). Hence . Thus  has at most  classes. Conversely, two prefixes with different lengths modulo  produce different future outputs (their next few outputs disagree), so all  residue classes are distinct. Therefore  has exactly  classes, so finite index. Example (Not Finite Memory – Perfect Square Count Predicate)Let . Define deterministic reactive  by the stepwise rule:at each step emit  iff the total number of s seen so far is a perfect square; otherwise emit .Formally, if after reading prefix  the count of 1s is , then on reading a next symbol the output at the time the -th input is processed depends on whether  is a square.Claim.  has infinite index (so  is not finite memory).Proof for this claim can be found in appendix Definition 2.9 (Reactive)A behavior  is reactive if it is length-preserving, andprefix-preserving, i.e.  ( is a prefix of  ). Equivalent characterization for deterministic behavior: A behavior    is reactive if and only if Reactivity does not imply memory-lessness or finite memory. A reactive behavior may depend on the entire observation history , not just a bounded summary of it. (In particular, “finite memory” is strictly more general than the memoryless/Markov case, where the action depends only on the most recent observation: a finite-memory agent compresses the full history into one of finitely many states, each of which can encode patterns spanning many past observations.)Reactivity formalizes the intuitive idea that an agent must act in real time. Once an action is taken, it cannot be retracted when new information arrives. This matters for enforceability: a specification may allow some full action sequence in hindsight, yet no reactive agent can realize it because early commitments inevitably block all allowed continuations. Prefix-wise enforceability makes this tension explicit by asking whether an agent can always keep some allowed future open while acting online. We will see these definitions more formally later.Example (Reactive)Same as the prefix-preserving example above:  maps observations to immediate outputs, so . This is reactive and causal by construction.Example (Length-preserving but non-reactive behavior)Let  and . Define a deterministic function where the output string has the same length as . If  denotes the empty string then . This behavior emits the correct number of actions, but it does not act online. The first action symbol depends on whether the entire observation history has even or odd length, which cannot be determined when the first observation arrives. As a result, the agent’s early actions are not commitments it may retroactively “change its mind” once more observations arrive. This violates the core idea of reactivity: acting step by step in real time.Example (Reactive but not finite memory)Let , . Define a deterministic reactive  by the stepwise rule:at each step output  if the total number of 1s seen so far is a perfect square, otherwise output .Formally , where  depends on the count of 1s in .This  is reactive (stepwise defined) but not finite-memory, since tracking “is count-of-1s a square?” requires unbounded memory.This example shows reactivity alone doesn’t bound memory cost. The agent behaves online (each step emits one output) yet the rule depends on an unbounded statistic (counts), so no finite-state controller can implement it. The properties above (length-preserving, prefix-preserving, finite memory) are exactly the ingredients of our main FSRA characterization theorem (Theorem 3.13). We will state and prove it in Section 3. Before that, we introduce the idea of enforceability and realizability, which we will use later when talking about which specifications such agents can satisfy. Definition 2.10 (Time-invariance)A family of online behaviors may be specified by a step-indexed family of per-step update/output rules , where each  maps a current internal state and the new observation to a next state, and  maps the current state and observation to the emitted action. We call the behavior time-invariant (or stationary) if the per-step rules do not depend on ; that is, there exist maps  and  such that Time-invariance adds the further restriction that the decision rule doesn’t secretly depend on how many steps have elapsed.Definition 2.11 (Enforceability and Realizability).Weak enforceability: We say that specification is weakly enforceable by a set of controllers  if equivalentlyAnother way to define would be like thiswhenever the specification allows some action at history , at least one controller in can produce an allowed action at . The controller that works may depend on .Weak EnforceabilityStrong enforceability: We say that  is strongly enforceable by a set of controllers  if equivalently every action allowed by the specification at  is realized by some controller in  (the realizing controller may depend on the action and on ).Strong EnforceabilityRealizability: We say that  is realizable by a set of controllers  if i.e. for every  there exists some controller  with , and conversely every  emitted by some controller in  belongs to .Deterministic simplifications. When  induces a set of deterministic function i.e. , then we can just write:For Weak enforceability:         equivalentlyFor strong enforceability:           equivalently  Special case: singleton controller. When  is just a singleton set i.e.  then the definitions three definitions just simplify to: Weak Enforceability reduces to  and in the deterministic case this is  whenever . Strong Enforceability reduces to  (i.e. every allowed action must be produced by the single controller), which in the deterministic case means  is either empty or equals .Realization reduces to Enforceability compares an abstract specification  (what counts as allowed action histories after each observation history) to what a set of concrete controllers  can actually produce. Weak enforceability asks only that, whenever the spec allows at least one action after a history , some controller in  can produce at least one allowed action at  (the controller may depend on the history). Strong enforceability demands more: every action that the spec allows at  must be produced by some controller in  (so the controllers collectively realize every permitted action). Realizability asks that the set   exactly matches the specification: the set of all (history, action) pairs realized by controllers in   is equal . These notions let us formalize intuitions like “the spec is implementable by simple reactive agents” and to separate cases where a spec is merely compatible with controllers from cases where it is exactly captured by them.Example (Weak but not Strong)Let  Define the specification (For every other observation string .)Let the controller set be the singleton  with deterministic behavior  given byThen:For strong enforceability we require  with . But  and no  produces  ; hence  is not strongly enforceable by .Realizability asks . Here  ; thus is not realizable by .Example (Strongly enforceable but not realizable)Let  Define the specification Let the controller set be  with behaviorsThenStrong enforceability: for each allowed action in  there exists a controller producing it:  Thus  is strongly enforceable by Realizability:  . Hence  and  is not realizable by .Proposition 2.12 (Relation between different notions of enforceability): Let  be a set of controllers and  a specification. ThenPrefix-wise Enforceable: Say  denotes the set of all prefixes of strings in . A specification  is prefix-wise Enforceable by  if  Equivalently: for every observed prefix  and every continuation  for which the full specification admits some action  on , the controller  already offers at prefix  at least one action-prefix that can be extended to an allowed full action on .Prefix-wise enforceability is an explicitly online notion: it requires that a single controller  already commit, at each observation prefix , to some action-prefixes that are extendable to full allowed actions for every possible future continuation  that keeps the specification satisfiable. In practice, this is a useful notion when agents must act before observing the entire future, it demands that the agent’s early choices never rule out reaching some allowed final action once future observations arrive.Characterization of FSRA and ExamplesTheorem 3.13 (Characterization of FSRA) Let  be finite alphabets and let  be a deterministic function. Then there exists an FSRA  with induced transduction  iff all of the following hold:  is length-preserving: .  is prefix-preserving/causal: .  is finite memory:  has finitely many equivalence classes.(Equivalently:  is implemented by some FSRA  properties (1)–(3) hold.) Proof (Theorem 3.13): Let be an FSRA and suppose Length-preserving. By the FSRA definition the machine emits exactly one action symbol for each input symbol. Therefore for every  the produced action string has the same length as . Hence Prefix-preserving. The FSRA produces each output symbol at the time its corresponding input symbol is read; in particular the output produced after a prefix  is the prefix of the output produced after any extension . Formally, if the state sequence after reading  is  and the outputs are , these are initial symbols of the outputs produced on any longer input . Thus  is prefix-preserving.Finite memory. Let  be the finite state set of . Define a map  by . If  then for every continuation  the machine, starting from the same state, will produce identical future outputs on  and . Hence . Thus, the number of -equivalence classes is at most , so  has finite index.This completes the (⇒) direction. Note also that the FSRA gives explicit fixed maps  (so, time-invariance holds for any FSRA) Assume  is deterministic and satisfies length-preserving, prefix-preserving, and finite-memory. We build an FSRA  that implements .1. State set. Letbe the set of equivalence classes of . By finite-memory this set is finite. Write  for the -class of . Let the start state be .2. Define . For class  and observation  defineThis is well-defined: if  then , so for every continuation  we have ; taking  of the form  shows , hence .3. Define . We must define the single action symbol emitted when in class  upon reading . Because  is length-preserving and prefix-preserving, for every  and  the string  is a prefix of  andso  equals  concatenated with exactly one symbol. Thus there exists a unique  withDefine: i.e. the last symbol of . We must check  is well-defined: if  then  so for every continuation , . In particular for  we get , and for  we get . Therefore, the unique last symbols of  and  coincide; thus .4. The machine. Put . By construction  are total maps  and , respectively, so  is an FSRA (deterministic, finite , one output per input).5. Correctness . We show by induction on length of  that the action string produced by  on input  equals .Base : machine produces  and  (length-preserving and causal behavior give that conventionally).Inductive step: assume for  that the machine output equals  and that the machine reached state . On reading next symbol  the machine outputs , which we defined to be the unique symbol  with . Therefore, the machine’s output after  equals . The next state is =[uo]). This completes the induction.Hence .Thus (1)–(3) imply existence of an FSRA implementing . The constructed  are fixed maps (time-invariant) by construction. We call this model a Finite-State Reactive Agent because the name describes exactly what the math enforces. It is finite state because the agent’s memory is a fixed, finite set of states (the  classes), there is only a bounded amount of past information the agent can store. It is reactive because the agent responds to each observation immediately: for every input symbol it emits exactly one action symbol, and the action at time t depends only on the observations up to t (no lookahead, no delayed planning). And it is an agent because the whole object is a policy-like mapping from observation histories to actions a causal, online decision procedure. This label therefore does more than decorate the definition: it communicates the limits and strengths of the model at a glance (what the agent can implement, and what kinds of behavior definitely require richer models such as stacks, external memory, or unbounded belief states).Corollary 3.14 (Realizability of deterministic reactive finite-memory specifications by FSRAs)Suppose  be a deterministic behavior that is reactive (i.e. length-preserving and prefix-preserving) and has the finite-memory property. Define the induced specification .Then there exists a Finite-State Reactive Agent (FSRA)  such thatIn particular,  is realizable by the singleton controller set  Theorem 3.15 (Strict Separation of Realizability and Enforceability for FSRAs)There exist specifications  and  over a one-symbol observation alphabet  and action alphabet  such that: is strongly enforceable by the set of all FSRAs, but  is not realizable by any set of FSRAs (finite or infinite). is weakly enforceable by a finite set of FSRAs, but  is not strongly enforceable by any finite set of FSRAs.Proof.We treat the two constructions separately.Part 1: Construction of Let  and . For each  write  for the length- observation string. Define the length- action , the unique length- bitstring whose single 1 appears at position . Define the specificationThus, for every observed prefix  the specification allows exactly the single action .(i) Strong enforceability by the set of all FSRAs.For every  there is an FSRA  (With  states) that outputs  on the first  steps and outputs 1 on the -th step. Concretely,  simply counts up to  and emits the described pattern. Hence for each allowed pair  there exists an FSRA (namely ) that produces that exact pair. By definition (Def. 2.11) the set of all FSRAs therefore strongly enforces .(ii) Non-realizability by any set of FSRAs.Suppose  is any set (finite or infinite) of FSRAs and that  realizes , i.e.Then in particular every pair  belongs to the right–hand side, so for every  there exists some  with . But realizability as equality forces more: for every  and every  we must have . Hence for every  and every , . Therefore every  implements the same global map  defined by . In particular,  must contain at least one FSRA whose induced function is equal to .But the function  above is not implementable by any FSRA: the equivalence relation  has infinite index because distinct prefix lengths are distinguished by future behavior; equivalently  is not finite-memory and so (by Theorem 3.13) not implementable by any FSRA. Thus, no FSRA in  can induce , a contradiction. Therefore, no set  of FSRAs realizes .Part 2 – Construction of Again let  and . For each  setthe set of length- bitstrings that contain exactly  ones.(i) Weak enforceability by a finite set.Let  be the simple 2-state FSRA that ignores the input symbol and alternates outputs  (a 2-cycle). For every  that controller’s length- output contains exactly  ones (the ones appear at the even positions). Hence  for every . Therefore, the singleton finite set  satisfies the weak enforceability condition: for every observed history  there exists a FSRA in this finite set whose produced action at  is allowed.(ii) Failure of strong enforceability by any finite set.Fix any finite FSRAs set . For a given length  the set  can produce at most  distinct length- action strings (each FSRA gives one string of length ). But the size of the fiber  equals the binomial coefficient , which grows super-polynomially in . Thus, for sufficiently large  we have . Hence  cannot contain FSRAs producing every allowed string in . Since strong enforceability would require that every allowed action at each history be produced by some FSRA in the set (Def. 2.11), no finite  can strongly enforce . Example (FSRA – Remainder mod k of the binary stream)Setup: Fix integer . Observation Alphabet  (bits observed); action alphabet  (we let each output symbol be the remainder, for readability). Define deterministic reactive  by: for input , the output at step  is the remainder of the integer represented by the prefix  (interpreting the bit string in binary, most-significant-bit first) modulo . Equivalently letand define  (a string of symbols in ). A small working example: () Input .  Output  Finite Memory: Consider the relation  on prefixes:Claim:  iff the current remainders after  and  are equal, i.e. , where  denotes the integer value of  If  then for any continuation  the recurrence  depends only on  and , so . Hence .Conversely, if , take  to be a single symbol  which distinguishes the remainders by the recurrence. So , hence .Therefore, the equivalence classes are exactly the  residue classes modulo . So  has finite index  – finite memory. FSRA: Define  by (the residue classes)..For state  and observation :After reading a prefix whose current remainder is , seeing bit  updates the remainder to  and the FSRA emits that new remainder as the action for that timestep. This is deterministic, length-preserving, and causal.For example: , input :start from .read 1:  emit read 0: , emit .read 1: , emit .Output: Finite-State Reactive Agent for binary remainder mod 3.Specification: where   and For every observation prefix , the unique allowed action history is the sequence of remainders obtained by repeatedly updating the current remainder by  for each next bit . Thus So . Example (DFA and FSRA – Absence of a substring)Let  which is the collection of all finite strings over  with no two consecutive  symbols. The empty string  We first see a DFA  for it, to be specific a minimal DFA (i.e. the smallest (with respect to number of states) DFA which accepts  )States  where: = “no  at the end / start state” (last symbol was  or empty), = “last symbol was  and no forbidden  has occurred so far”, = dead/trap state (we have seen substring ).Start state: .Accepting states:  (we accept as long as  has not occurred).Transition function : Minimal DFA detecting absence of substring “11”Setup: Observations  (each step the agent observes a bit). Actions (at each step the agent emits 1 iff the prefix seen so far is in , else 0). Define the deterministic reactive function  by: for input , the -th output symbol  equalSo, Finite Memory: The relation  has exactly three equivalence classes, so  has finite memory. (We have proven our claim in the appendix)Define three candidate classes by partitioning all prefixes  into three sets: (include  here). (the dead/trap class).These three sets are disjoint, and their union is all prefixes. Note: We are using  here and in the proof given in the appendixEvery prefix belongs to one of the three classes , each class is contained in a single -equivalence class, and the three classes are pairwise distinct. Therefore  has exactly three equivalence classes (finite index). By Definition 2.8 (finite-memory),  has finite memory. As a corollary, any FSRA implementing  requires at least 3 states (one per  class); the standard construction produces the 3-state FSRA. FSRA: Construct FSRA  directly using the DFA states. as above. is the start state. (same as DFA): (emit 1 iff the new state after reading the input is accepting): For example, input .Start in .Read 1: new state , emit  because 1 alone is in . (Output so far: 1.)Read 0: new state , emit  because  Output: 11.Read 1: new state , emit  because . (Output: 111.)If we had input 11 then at second step:Start . read 1 → , emit 1.Read 1 → , emit 0. So, outputs 10 indicating the prefix of length 2 is rejected.FSRA detecting absence of substring “11”Specification: The realized specification is the singleton-valued relationFor each observation prefix , the fiber  is the unique action history produced by the FSRA after prefix .   Example (DFA and FSRA – Constraint on Count and Ending)Fix , observation alphabet . Fix indices , integers . Define(By convention the empty string does not end in  and thus, is not in  unless you explicitly want to treat it otherwise.) We build a deterministic finite automaton that keeps two pieces of summary information:the residue  equal to the current count of  modulo , andthe last seen symbol , where  denotes “empty / no last symbol yet” (useful for the initial state).Define. So .Start state  (zero occurrences seen, no last symbol).Transition  given by𝟙 𝟙 is 1 if  and 0 otherwise.)Accepting set An example,  Setup: We convert the language into a Boolean behavior, which is a convenient FSRA target. Let Observations:  and Actions:  Define  by: for any input  and index 𝟙So  is the binary stream indicating, step-by-step, whether the observed prefix satisfies the two constraints (count-of- and prefix ends with Finite Memory: The classes of  are precisely the DFA states. In particular  has at most  classes, so  is finite memory. Using the similar proof technique like the previous example we can show that index of  is .FSRA: We now give the Finite-State Reactive Agent  that implements the indicator function  above. (same as DFA states). (per-step indicator).Start .Transition  (state update) is the same as the DFA:𝟙Output function 𝟙Take , alphabet . Let  , . Let  and . So, we require  ends with d.Start . Consider input.Read : new residue , last symbol b. New state . Since  but last symbol , (). Output so far 0.Read :  stays 1, last  → state . Not . Output 0. Outputs 00.Read : , last  →  → output 0. Outputs 000.Read : (r=2), last  →  → output 0. Outputs 0000.Read : (r=2), last  → . Since  output 0. Final outputs: 00000.Now consider appending four more  then : continuation .After appending  the residue increases by 4 → new residue .After final  the residue remains , last symbol d, and outputs 1.  ConclusionThinking of agents as simple finite-state machines helps clarify what they can and cannot do. It shows which interactive behaviors can be handled using only limited memory, and when more powerful models are truly necessary. This perspective makes the design space clearer: some goals can be achieved with small, structured controllers, while others require fundamentally richer mechanisms.There are several natural directions to explore. We can study what changes if the agent is allowed to use randomness, make delayed actions, or update its behavior over time through learning. Each of these makes the model more realistic, but also raises new theoretical questions. Important technical problems include: (i) understanding how hard it is to decide whether a specification can be enforced with limited memory, (ii) finding the smallest possible controller that satisfies a given specification, etc.These directions connect verification, synthesis, and learning, promising a practical theory for building simple, verifiable reactive agents. AppendixDeterministic Finite Automata and TransducersDefinition A.1 (Deterministic Finite Automaton – DFA) A DFA is a tuple , where:  is a finite set of states  is a finite input alphabet   is the transition function (assumed total i.e. its defined for all possible tuples )  is the start state  is the set of accepting statesA DFA defines a language  (= set of all finite length string formed using the alphabet in the alphabet set ) by the usual acceptance criterion: a string  is accepted if and only if the state reached after reading  from  belongs to .Here is a simple example to understand it. We will construct a DFA  which that reads a binary string and accepts if the number of 1s in the string is even. (And remains in the final state whenever the number of 1s encountered till now is even.)Formally, define the automaton  as follows:States:  where the state encodes whether the agent has seen an even or odd number of 1s so far.Alphabet: Start state:  (before reading anything, the count of 1s is zero, which is even)Accepting states: Transition function :Reading 0 does nothing to the parity:   Reading 1 flips the parity:DFA for Parity of number of 1sGiven any input string  , the DFA starts in  and updates its state step-by-step according to . The string is accepted if the final state lies in , i.e., if the total number of 1s in  is even.For example:Input 1010 → two 1s → ends in  → acceptedInput 111 → three 1s → ends in   → rejectedYou can view this DFA as a minimal agent with memory:The state is the agent’s internal memory (here just one bit: even vs odd).The alphabet is the agent’s observation space.The transition function is the agent’s update rule.The accepting states encode a goal or success condition. It already captures the core agent loop: observe → update internal state → decide success/failure. Definition A.2 (Mealy Machine – A Deterministic Finite Transducer) A Mealy machine is a tuple , where: is a finite set of states    is a finite input alphabet    is a finite output alphabet    is the (total) transition function  is the output function is the start stateOn input  starting from , the machine produces the output:where the sequence of states   is defined by .In a Mealy machine, each transition produces an output.The output generated at each step depends on both the current state and the current input symbol and is produced synchronously with the consumption of that input symbol.Thus, a Mealy machine defines a functionthat maps every input string to an output string computed incrementally during the run of the machine.Equivalently, the transition and output functions can be combined into a single functionwhereThis formulation emphasizes that output is produced on transitions, which is why Mealy machines are also called deterministic finite-state transducers. Now, let’s discuss a small example. We construct a Mealy machine that, on each input symbol:outputs 0 when it reads 1outputs 11 when it reads 0Thus, the machine maps any binary input string to a binary output string in which every 1 appears in pairs, so the total number of 1s in the output is always even.Define  by: (a single state, the machine is memoryless) (input alphabet) (output alphabet) (stay in  on every input) (output 0 on input 0, output two 1’s on input 1)Equivalently you can combine  and  into one  with   Notation: Input/output. Mealy Machine to output double 1s for each 1 in input. Here is an example run: Input: . State sequence:  (always ) Output (stepwise): Concatenate to get . Count of 1’s in  = = even.This Mealy machine produces strings in the languageEvery string in  has an even number of 1’s, but not every binary string with an even number of 1’s is in  . For example the string 101 has two 1’s (even) but is not in . So the machine guarantees parity but does not generate all even-1 strings.If your goal is all even-1 strings, you need a different transduction strategy (for example allowing outputs ε or single 1’s at some steps plus internal bookkeeping so the machine can place 1’s arbitrarily while keeping total parity even). That requires either a larger state space or different output rules. The Mealy machines defined above allow a single observation to trigger a variable-length burst of output. While this generality is convenient for describing transductions, it is too permissive for modeling agents that must performs a single action at each step in real time.In an online agent setting, we might want a stricter notion: at each timestep, the agent observes exactly one symbol and immediately emits exactly one action. This synchrony ensures that the agent’s behavior is length-preserving.To capture this reactive, per-timestep interaction pattern, we now introduce the standard Mealy machine, obtained by restricting the output function to produce a single symbol per input symbol. Definition A.2′ (Standard Mealy machine) A standard Mealy machine is a tuple  where all components are as in Definition A.2 except the output function is  (a single output symbol per input symbol). On input , the output is ), hence .  Think of this Mealy machine as a tiny reactive agent:Observation space  is what the agent sees at each timestep.Action space  is what it can emit; here actions can be length-2 strings (so actions may produce multiple primitive outputs in one step).The internal state  is the agent’s memory (here trivial: a single state, so no memory).The pair  is the agent’s policy + state-update rule. On every observation the agent immediately emits output determined by .In this example the agent’s policy enforces a global constraint (even parity of 1’s) by a local rule (“whenever you see a 1, emit 11”). This shows two useful points for thinking about agents:Global invariants from local rules: A simple per-step policy can guarantee a nontrivial global property of the interaction history (here: even parity). Expressivity vs constraints: The policy here is conservative it enforces parity but at the cost of reduced expressivity (not all acceptable-looking outputs are reachable).  Definition A.3 (Moore Machine) A Moore machine is a deterministic finite-state transducer given by a tuple , where all components are as in the Mealy machine definition except that the output function is , depending only on the current state.On input , the machine produces output:where  for  (with  the initial state). Equivalently, some conventions output  as the first symbol, either convention yields equivalent expressiveness.Mealy and Moore machines are equivalent in expressive power: every Mealy machine can be converted to an equivalent Moore machine with at most a linear increase in the number of states, and vice versa.  Recall the Mealy machine from the previous example, which on input 0 outputs 0 and on input 1 outputs 11, thereby ensuring that the total number of 1’s in the output is always even.We now construct a Moore machine that produces exactly the same input–output behavior.Formally, define  as follows:States: Input alphabet: Output alphabet: Initial state: Transition function (): Output function (): A Moore Machine which is equivalent to the Mealy Machine in previous exampleThus:whenever the machine enters state , it outputs 0,whenever it enters state , it outputs 11.For example, on input , the state sequence is ,and the output is .As in the Mealy case, each input symbol 1 contributes exactly two 1’s to the output, and each 0 contributes none. Hence every output string has an even number of 1’s.This Moore machine implements the same transduction as the Mealy machine in the previous example, but the output is associated with states rather than transitions. The extra state  is needed to remember which output should be produced after reading the current input symbol. This illustrates the general transformation from Mealy to Moore machines: outputs that depend on  in a Mealy machine are encoded into states whose labels determine the output in a Moore machine.Moore machines label states with outputs rather than labeling transitions. The most general Moore transducer can emit a short burst of output when a state is entered, but that extra generality is unnecessary when modeling agents that act once per timestep. To model a reactive, step-by-step agent it’s clearer to assume each entered state produces exactly one action symbol. This synchronous restriction  gives the standard Moore machine.Definition A.3′ (Standard Moore machine) A standard Moore machine is a tuple  where all components are as in Definition A.3 except the output function is . On input , the output is . From an agent viewpoint, Moore machines correspond to agents whose actions depend only on their current internal state, not directly on the most recent observation. The observation updates the state via , and the action is then emitted as a function of that state alone. Compared to Mealy machines, Moore machines enforce a stronger separation between state update and action selection. This distinction mirrors a common design choice in AI systems: whether actions are allowed to depend directly on raw observations, or only on a processed internal representation. The equivalence between Mealy and Moore machines suggests that this is largely a representational choice rather than a fundamental difference in capability. Remaining ProofsProof for non-finite memory of For each integer  let  (The prefix consisting of  ones), so . We show  and  are not -equivalent whenever .WLOG . Let  (nonzero). Choose an integer  large enough that(Existence: for fixed nonzero , the equation  implies ; this has only finitely many integer solutions, so for sufficiently large  the second condition holds.)Set  and take continuation  (i.e.  more ones). Then, a perfect square, so the output bit at the final step of  (and indeed one of the outputs in the run) is ., which by choice of  is not a square, so the corresponding output bit in that position is .Thus  and  differ, so . Because there are infinitely many , the relation  has infinitely many equivalence classes. Hence  does not have the finite-memory property.Proof for finite memory of :If two prefixes lie in the same class, then they are -equivalent. Take  in the same class, we show  Case 1 (): If a prefix already contains 11, then every extension also contains 11. Hence for any continuation , every longer prefix of  (from the point where 11 first appeared onward) is not in the language, and the same holds for . Concretely, the per-prefix membership bits for  and  coincide at every position (they may differ on earlier positions before the first 11, but those are identical by assumption of same class, both have already passed the 11 event). Thus .Case 2 (): Neither  nor  contains 11, and both end in 0. From that point onward, whether a future prefix  is in the language depends only on the continuation ) and the fact that the last seen symbol before  is 0. Since both  and  share this same “local summary” (no 11 seen, last symbol 0), for every continuation  the sequence of future per-prefix membership bits after position  depends only on  and not on the internal details of  or . Also the membership bits up to  and  are the same shape relative to their class. Hence .Case 3 (): Similar argument: neither contains 11 and both end with 1. Future membership after appending  depends only on the fact “no 11 so far and last symbol = 1” and on . Therefore . So, each class is contained in a single -equivalence class. We now show that for any two distinct classes there exists a continuation that makes the two resulting outputs differ; this proves they are different -equivalence classes.Distinguish  Let  and . Consider the one-symbol continuation . Then:  contains 11 (since u ended with 1), so the output bit at that step is 0 (the prefix ending at that position is not in the language). does not contain 11 (since v ended with 0), so the output bit at that step is 1.Thus  and  differ at the last position, so .Distinguish : Let  and . Take continuation  (a block of  zeros) of any positive length. For every extension , since 11 already occurred inside , none of the prefixes at or after the position where 11 occurred will be in the language, in particular, the outputs at and after position  are zeros. But for  (where v has not seen 11 and ends with 0), all those extended prefixes continue to avoid 11 and so the outputs at these positions are ones. Thus  and  differ (for instance, at position , so . Distinguish : Similar: take . For , outputs after  are zeros; for  and suitable  (even  suffices), some prefixes of  will still avoid 11 and so produce ones. Concretely,  keeps no 11 and so has output 1 at that position, while  has output 0. So Hence the three candidate classes are pairwise distinguishable, so they correspond to three distinct -classes. Discuss ​Read More

​Published on February 15, 2026 11:38 AM GMTThis work was produced during the Dovetail Research Fellowship. I’m grateful to Alex, Alfred, Winter, and Santiago, and to everyone who provided feedback, and to the fellowship community for their discussions.AbstractWe study AI agents through the lens of finite-state transducers and string-to-string behaviors. Instead of treating automata purely as language recognizers, we model agents as systems that map observation histories to action histories while interacting with an environment online. We introduce Finite-State Reactive Agents (FSRAs) as deterministic, finite-memory, reactive policies, establishing their equivalence to standard Mealy and Moore machines. Our main result (Theorem 3.13) characterizes exactly which behaviors can be implemented with finite internal state: a deterministic behavior is FSRA-implementable if and only if it is length-preserving, prefix-preserving (causal), and has finite memory, & relate FSRAs to standard Mealy and Moore machines and identify when richer computational models are unavoidable.  We also introduce specifications (constraints on allowed agent behaviors) and formalize notions of weak enforceability, strong enforceability, and realizability. Using our characterization theorem, we prove strict separations between these notions (Theorem 3.15), demonstrating when finite-state agents can or cannot satisfy given specifications. IntroductionDeterministic Finite Automata (DFAs) are usually introduced as a basic model of computation: simple machines that read an input symbol by symbol and decide whether to accept or reject a string. In theoretical computer science, they are often treated as a warm-up before more powerful models such as Turing machines.In this work, we take a different perspective. A DFA can also be viewed as a minimal agent: it maintains a small internal state, receives observations from the environment, updates its state deterministically, and eventually produces an outcome. Despite their simplicity, DFAs already capture the core interaction loop shared by many AI systems: observe → update internal memory → act or evaluate.This observation motivates a shift from language recognition to agent behavior. We introduce the model of a Finite-State Reactive Agent (FSRA), which generalizes DFAs from passive evaluators to active, online systems that map streams of observations to streams of actions using finite internal memory. FSRAs formalize deterministic, reactive agents whose behavior is causal and prefix-preserving, and whose internal state evolves according to a fixed update rule (time-invariance). They let us ask precise questions about memory, state abstraction, observability, and goal satisfaction without the complications of learning (updating the policy from data), optimization (searching over policies or parameters to maximize an objective), or stochasticity (randomness in the agent’s actions or environment). These characterizations let us state simple, checkable conditions for when a specification is enforceable or realizable by a set of finite-state controllers.Outline. The rest of this post is organized as follows. Section 1 introduces the Finite-State Reactive Agent (FSRA) model and states its equivalence with standard Mealy and Moore machines. Section 2 develops formal properties of agent behaviors: determinism, length-preservation, causality (prefix-preservation), finite memory, reactivity, and time-invariance, modeled as properties of string-to-string functions. It concludes by introducing specifications (constraints on allowed behaviors) and three enforceability notions: weak, strong, and realizability. Section 3 presents our main results: Theorem 3.13 characterizes exactly which behaviors are FSRA-implementable (reactive + finite-memory), Corollary 3.14 shows that such behaviors induce realizable specifications, and Theorem 3.15 proves strict separations between the enforceability notions, demonstrating fundamental gaps between what finite-state agents can weakly satisfy versus fully realize. We conclude with worked examples including binary remainder computation (mod k), substring avoidance (no “11”), and count-with-ending constraints. Standard automata-theoretic background (DFAs, Mealy machines, Moore machines) and deferred proofs are collected in the Appendix.Finite State Reactive AgentMealy machines (Definition A.2 in the Appendix) are deterministic finite-state transducers that produce an output symbol synchronously with each input symbol.  This makes them a natural formal model for very simple online agents: the input alphabet corresponds to the agent’s observations, the finite state encodes the agent’s memory, and the per-step output is the agent’s immediate action.  To make this perspective explicit and to use language familiar to AI and alignment readers, we adopt the name Finite-State Reactive Agent (FSRA) for a Mealy machine interpreted as a deterministic, finite-memory, online policy.  The formal Definition 1.1 below gives the FSRA tuple and the precise causal / prefix-preserving behavior we require.Definition 1.1 (Finite State Reactive Agent – FSRA)A Finite-State Reactive Agent (FSRA) is a deterministic, finite-memory, online transducer that maps each input (observation) prefix to an output (action) prefix of the same length i.e. it emits exactly one action symbol for every observation symbol it consumes.Formally, an FSRA is a tuple , where is a finite set of states (the agent’s internal memory), is a finite observation alphabet (input), is a finite action alphabet (output), is the (total) state-update function, is the output (action) function, producing exactly one action per observation, is the start state.The FSRA implements a functionthat is length-preserving:and causal / prefix-preserving: the -th output symbol depends only on the first  input symbols, and for any prefix  and continuation ,  is a prefix of .Concretely, on input  starting from , define the state sequence with  given. The produced action sequence is, so .Remark (generalization). If you want to allow a variable number of action symbols per observation, simply let . That yields the Mealy/transducer formulation where one transition may emit a finite output string; the FSRA above is the special case that is strictly one action per observation.Calling this model an FSRA makes the intended interpretation explicit:   are observations,  are actions, and the pair  together specify a deterministic, finite-memory policy. An FSRA captures the simplest kind of online agent (these terms are discussed in detail in the next subsection): it is reactive (length- and prefix-preserving, emitting exactly one action per observation), has finite memory encoded by the finite state set Q, and is deterministic.This model is useful for reasoning about what can be implemented with finite memory, and for making clear where finite-state agents fail. When one wants to model policies with longer or variable delay between observation and action, or policies that maintain richer beliefs, the FSRA can be generalized (e.g.  or by increasing , but ultimately some behaviors require strictly more expressive models (e.g. pushdown automata, Turing Machines, or stochastic/learning agents).Proposition 1.2 (Equivalence with Mealy / Moore)Let  be a finite observation alphabet set and  a finite action alphabet set.Every FSRA is, by definition, a standard Mealy machine (with input alphabet , output alphabet , and single-symbol output per transition – Definition A.2′ in the Appendix). Conversely, every standard Mealy machine (whose output function produces exactly one symbol per input) is an FSRA. This equivalence is immediate from the definitions.For every FSRA  there exists an equivalent standard Moore machine (Definition A.3′ in the Appendix) computing the same length-preserving transduction , and conversely, every standard Moore machine can be converted into an equivalent FSRA.Brief justification. Part 1 is just a renaming of components. For Part 2, the FSRA-to-Moore direction requires splitting each state  into copies  – one for each output symbol a that  can produce, so that the output can be read from the state alone; this can increase the state count up to . The Moore-to-FSRA direction works by defining  where  is the Moore output function, moving the output from the arrived-at state to the transition that enters it.Formal Properties of Agent BehaviorsIn this section, we define several properties of general observation-to-action behaviors, modeled as string-to-string functions. We then identify which of these properties are satisfied by Finite-State Reactive Agents (FSRAs).Let  and  be finite alphabets, representing observations and actions respectively. We consider functionswhere  denotes the collection of nonempty subsets of . Such a function assigns to each finite observation history a set of admissible action histories.Later, we will specialize to deterministic behaviors, which are functions of the formExample (Behavior)Let . Define a behavior  as follows:In this example, the behavior is nondeterministic: after some observation histories, multiple action histories are permitted.Intuitively, a function  describes everything an agent is allowed to do, rather than what it will do. For each observation history,  lists the action histories that are consistent with some specification, constraint, or design goal. Deterministic agents correspond to the special case where exactly one action history is allowed at each observation history.It is useful to separate behaviors from agents. A behavior  specifies which action histories are acceptable after each observation history, without committing to a particular decision rule. An agent, by contrast, must choose a single action at each step.We also distinguish both from a specification: a specification P is a constraint on what an agent is allowed to do (formally, a relation ; see Definition 2.5), whereas a behavior is what the agent actually does. This three-way distinction: specification, behavior, agent, lets us ask precise questions. For example: “Can a simple agent satisfy this specification?” (i.e., does there exist an agent whose behavior stays within the allowed set?). In the formal-methods and control-theory literature, this is called enforceability: a specification is enforceable by a set of agents if at least one agent in that set produces only allowed behaviors. We can also ask the converse: “Can a set of agents collectively realize every behavior the specification allows?” These notions are formalized in Definition 2.11.Allowing  to be nondeterministic is intentional. Many natural specifications such as safety constraints, permissive policies, or sets of acceptable plans do not prescribe a unique action at every step. Instead, they describe a space of allowed behaviors. Definition 2.3 (Deterministic) A function  is deterministic if In this case, we may write  and identify  with the unique element of the singleton set. Determinism means that given any observation history, the action sequence is uniquely determined. Note that a deterministic behavior effectively is an agent’s policy, the distinction between “set of allowed action histories” and “committed choice” collapses when the allowed set is always a singleton. The behavior-vs-agent distinction is most useful in the nondeterministic case, where a behavior permits multiple options and the question becomes which agent (i.e., which deterministic selection) to deploy. Example (Deterministic)Let  and . DefineSo, the function returns  and  alternatively.Determinism is the simplest way to turn a permissive specification into an actual policy: once you pick a deterministic  you no longer have a set of allowed behaviors but a single thing the agent will actually do (to follow the behavior). This is useful when reasoning about concrete guarantees (e.g., “will this agent ever output bad?”) but remember determinism is a modeling choice, real systems are sometimes nondeterministic (stochastic) by design.Definition 2.4 (Controller and Controller Set) A controller is any concrete system (a finite automaton, a program, a circuit, a neural network, etc.) that determines a behaviorwhere  is the set of action histories the controller may produce after observation history o.  We deliberately leave “concrete system” informal: the only thing that matters formally is the induced behavior .A controller set  is a set of controllers; we write  and use the shorthand  for the mapping from controllers to their induced behaviors. (No additional structure on  is assumed, it is simply the collection of controllers under consideration, e.g., “all FSRAs” or “all 3-state Mealy machines.”)If each   is single valued we call the controller deterministic and identify Example (Controllers)Let  be the set containing two deterministic behavior/functions with:– .Then  is a controller set that can produce at least depending on which controller is selected.Definition 2.5 (Specification)Let  and  be finite alphabets, and letbe a specification (a relation between observation histories and allowed action histories).  For an observation history  write the fiber (i.e., the set of all action histories that the specification allows after ):A specification is not the same as a behavior. A behavior describes what a controller does; a specification describes what a controller is allowed to do. In particular,  can contain many action histories, it expresses a constraint (“any of these actions would be acceptable”) rather than a commitment (“this is the action I will take”). A single controller typically covers only some of the allowed pairs. The enforceability and realizability notions in Definition 2.11 make this gap precise: they ask whether controllers in a set  can, individually or collectively, cover what  allows. Let  be a set of controllers; each controller  induces a function (behavior)so that is the set of action sequences controller  produces on observation history . Define the relation implemented by  asObserve that .Think of the controller as the machine you could put on the robot: it has internal structure and a rule for mapping sensory input to motor outputs. The behavior ​ is the abstract trace of possibilities that a machine can generate. Distinguishing these levels is crucial: a specification can permit many possible action traces (multivalued), but only some will be reachable by your finite-state controllers. The phrase “is it enforceable?” asks whether the machines you can actually implement can realize at least one of the allowed traces at each point; “is it realizable?” asks whether the machines collectively capture exactly the specification.Definition 2.6 (Length Preserving)A function  is length-preserving if   For deterministic  this reduces to Example (Length-preserving)Let ,  and definei.e.   Then . Intuitively, a length-preserving function produces exactly as many output symbols as it consumes input symbols. This models an agent that emits one action per observation (or more generally, a fixed number of actions per observation). It’s an explicit modeling choice: many controllers (especially embedded or real-time ones) naturally fit this model.Definition 2.7 (Prefix Preserving/Causal)A function  is prefix-preserving (or causal) if For deterministic  this is equivalent to Causality captures the notion of online computation: the output produced after reading a prefix  must remain consistent (as a prefix) with the output produced after reading any extension . In other words, the agent commits to its early outputs and cannot retroactively change them when new observations arrive.Example (Prefix-preserving / causal)Let ,  and define  by the stepwise rule:on seeing  output ; on seeing  output .Formally, .Then  is a prefix of  for every .Definition 2.8 (Finite Memory)A deterministic function  has the finite-memory property if the equivalence relation  defined byhas finite index (i.e., finitely many equivalence classes).The relation  partitions observation prefixes into equivalence classes that cannot be distinguished by any future continuation. If this partition is finite, then the agent effectively needs only finite memory to encode its past: it suffices to remember which equivalence class the history belongs to. This is the string-function analogue of the Myhill-Nerode theorem for languages.The finite-memory condition is the formal version of “bounded internal state.” It says there are only finitely many distinct ways the future can unfold after different pasts, so keeping one of finitely many summaries is enough. This is exactly why finite automata are a natural model for resource-limited agents.Example (Finite Memory – Periodic every ) Fix . Let . Define deterministic  byi.e. the symbol emitted at step  depends only on the position  (every -th output is ).Claim.  has finite memory; the index of  is exactly .Proof. Define relation (R) on prefixes byThere are at most  classes under . If , then for every continuation  the positions in  where outputs equal 1 are the same (they are exactly those indices congruent to ). Hence . Thus  has at most  classes. Conversely, two prefixes with different lengths modulo  produce different future outputs (their next few outputs disagree), so all  residue classes are distinct. Therefore  has exactly  classes, so finite index. Example (Not Finite Memory – Perfect Square Count Predicate)Let . Define deterministic reactive  by the stepwise rule:at each step emit  iff the total number of s seen so far is a perfect square; otherwise emit .Formally, if after reading prefix  the count of 1s is , then on reading a next symbol the output at the time the -th input is processed depends on whether  is a square.Claim.  has infinite index (so  is not finite memory).Proof for this claim can be found in appendix Definition 2.9 (Reactive)A behavior  is reactive if it is length-preserving, andprefix-preserving, i.e.  ( is a prefix of  ). Equivalent characterization for deterministic behavior: A behavior    is reactive if and only if Reactivity does not imply memory-lessness or finite memory. A reactive behavior may depend on the entire observation history , not just a bounded summary of it. (In particular, “finite memory” is strictly more general than the memoryless/Markov case, where the action depends only on the most recent observation: a finite-memory agent compresses the full history into one of finitely many states, each of which can encode patterns spanning many past observations.)Reactivity formalizes the intuitive idea that an agent must act in real time. Once an action is taken, it cannot be retracted when new information arrives. This matters for enforceability: a specification may allow some full action sequence in hindsight, yet no reactive agent can realize it because early commitments inevitably block all allowed continuations. Prefix-wise enforceability makes this tension explicit by asking whether an agent can always keep some allowed future open while acting online. We will see these definitions more formally later.Example (Reactive)Same as the prefix-preserving example above:  maps observations to immediate outputs, so . This is reactive and causal by construction.Example (Length-preserving but non-reactive behavior)Let  and . Define a deterministic function where the output string has the same length as . If  denotes the empty string then . This behavior emits the correct number of actions, but it does not act online. The first action symbol depends on whether the entire observation history has even or odd length, which cannot be determined when the first observation arrives. As a result, the agent’s early actions are not commitments it may retroactively “change its mind” once more observations arrive. This violates the core idea of reactivity: acting step by step in real time.Example (Reactive but not finite memory)Let , . Define a deterministic reactive  by the stepwise rule:at each step output  if the total number of 1s seen so far is a perfect square, otherwise output .Formally , where  depends on the count of 1s in .This  is reactive (stepwise defined) but not finite-memory, since tracking “is count-of-1s a square?” requires unbounded memory.This example shows reactivity alone doesn’t bound memory cost. The agent behaves online (each step emits one output) yet the rule depends on an unbounded statistic (counts), so no finite-state controller can implement it. The properties above (length-preserving, prefix-preserving, finite memory) are exactly the ingredients of our main FSRA characterization theorem (Theorem 3.13). We will state and prove it in Section 3. Before that, we introduce the idea of enforceability and realizability, which we will use later when talking about which specifications such agents can satisfy. Definition 2.10 (Time-invariance)A family of online behaviors may be specified by a step-indexed family of per-step update/output rules , where each  maps a current internal state and the new observation to a next state, and  maps the current state and observation to the emitted action. We call the behavior time-invariant (or stationary) if the per-step rules do not depend on ; that is, there exist maps  and  such that Time-invariance adds the further restriction that the decision rule doesn’t secretly depend on how many steps have elapsed.Definition 2.11 (Enforceability and Realizability).Weak enforceability: We say that specification is weakly enforceable by a set of controllers  if equivalentlyAnother way to define would be like thiswhenever the specification allows some action at history , at least one controller in can produce an allowed action at . The controller that works may depend on .Weak EnforceabilityStrong enforceability: We say that  is strongly enforceable by a set of controllers  if equivalently every action allowed by the specification at  is realized by some controller in  (the realizing controller may depend on the action and on ).Strong EnforceabilityRealizability: We say that  is realizable by a set of controllers  if i.e. for every  there exists some controller  with , and conversely every  emitted by some controller in  belongs to .Deterministic simplifications. When  induces a set of deterministic function i.e. , then we can just write:For Weak enforceability:         equivalentlyFor strong enforceability:           equivalently  Special case: singleton controller. When  is just a singleton set i.e.  then the definitions three definitions just simplify to: Weak Enforceability reduces to  and in the deterministic case this is  whenever . Strong Enforceability reduces to  (i.e. every allowed action must be produced by the single controller), which in the deterministic case means  is either empty or equals .Realization reduces to Enforceability compares an abstract specification  (what counts as allowed action histories after each observation history) to what a set of concrete controllers  can actually produce. Weak enforceability asks only that, whenever the spec allows at least one action after a history , some controller in  can produce at least one allowed action at  (the controller may depend on the history). Strong enforceability demands more: every action that the spec allows at  must be produced by some controller in  (so the controllers collectively realize every permitted action). Realizability asks that the set   exactly matches the specification: the set of all (history, action) pairs realized by controllers in   is equal . These notions let us formalize intuitions like “the spec is implementable by simple reactive agents” and to separate cases where a spec is merely compatible with controllers from cases where it is exactly captured by them.Example (Weak but not Strong)Let  Define the specification (For every other observation string .)Let the controller set be the singleton  with deterministic behavior  given byThen:For strong enforceability we require  with . But  and no  produces  ; hence  is not strongly enforceable by .Realizability asks . Here  ; thus is not realizable by .Example (Strongly enforceable but not realizable)Let  Define the specification Let the controller set be  with behaviorsThenStrong enforceability: for each allowed action in  there exists a controller producing it:  Thus  is strongly enforceable by Realizability:  . Hence  and  is not realizable by .Proposition 2.12 (Relation between different notions of enforceability): Let  be a set of controllers and  a specification. ThenPrefix-wise Enforceable: Say  denotes the set of all prefixes of strings in . A specification  is prefix-wise Enforceable by  if  Equivalently: for every observed prefix  and every continuation  for which the full specification admits some action  on , the controller  already offers at prefix  at least one action-prefix that can be extended to an allowed full action on .Prefix-wise enforceability is an explicitly online notion: it requires that a single controller  already commit, at each observation prefix , to some action-prefixes that are extendable to full allowed actions for every possible future continuation  that keeps the specification satisfiable. In practice, this is a useful notion when agents must act before observing the entire future, it demands that the agent’s early choices never rule out reaching some allowed final action once future observations arrive.Characterization of FSRA and ExamplesTheorem 3.13 (Characterization of FSRA) Let  be finite alphabets and let  be a deterministic function. Then there exists an FSRA  with induced transduction  iff all of the following hold:  is length-preserving: .  is prefix-preserving/causal: .  is finite memory:  has finitely many equivalence classes.(Equivalently:  is implemented by some FSRA  properties (1)–(3) hold.) Proof (Theorem 3.13): Let be an FSRA and suppose Length-preserving. By the FSRA definition the machine emits exactly one action symbol for each input symbol. Therefore for every  the produced action string has the same length as . Hence Prefix-preserving. The FSRA produces each output symbol at the time its corresponding input symbol is read; in particular the output produced after a prefix  is the prefix of the output produced after any extension . Formally, if the state sequence after reading  is  and the outputs are , these are initial symbols of the outputs produced on any longer input . Thus  is prefix-preserving.Finite memory. Let  be the finite state set of . Define a map  by . If  then for every continuation  the machine, starting from the same state, will produce identical future outputs on  and . Hence . Thus, the number of -equivalence classes is at most , so  has finite index.This completes the (⇒) direction. Note also that the FSRA gives explicit fixed maps  (so, time-invariance holds for any FSRA) Assume  is deterministic and satisfies length-preserving, prefix-preserving, and finite-memory. We build an FSRA  that implements .1. State set. Letbe the set of equivalence classes of . By finite-memory this set is finite. Write  for the -class of . Let the start state be .2. Define . For class  and observation  defineThis is well-defined: if  then , so for every continuation  we have ; taking  of the form  shows , hence .3. Define . We must define the single action symbol emitted when in class  upon reading . Because  is length-preserving and prefix-preserving, for every  and  the string  is a prefix of  andso  equals  concatenated with exactly one symbol. Thus there exists a unique  withDefine: i.e. the last symbol of . We must check  is well-defined: if  then  so for every continuation , . In particular for  we get , and for  we get . Therefore, the unique last symbols of  and  coincide; thus .4. The machine. Put . By construction  are total maps  and , respectively, so  is an FSRA (deterministic, finite , one output per input).5. Correctness . We show by induction on length of  that the action string produced by  on input  equals .Base : machine produces  and  (length-preserving and causal behavior give that conventionally).Inductive step: assume for  that the machine output equals  and that the machine reached state . On reading next symbol  the machine outputs , which we defined to be the unique symbol  with . Therefore, the machine’s output after  equals . The next state is =[uo]). This completes the induction.Hence .Thus (1)–(3) imply existence of an FSRA implementing . The constructed  are fixed maps (time-invariant) by construction. We call this model a Finite-State Reactive Agent because the name describes exactly what the math enforces. It is finite state because the agent’s memory is a fixed, finite set of states (the  classes), there is only a bounded amount of past information the agent can store. It is reactive because the agent responds to each observation immediately: for every input symbol it emits exactly one action symbol, and the action at time t depends only on the observations up to t (no lookahead, no delayed planning). And it is an agent because the whole object is a policy-like mapping from observation histories to actions a causal, online decision procedure. This label therefore does more than decorate the definition: it communicates the limits and strengths of the model at a glance (what the agent can implement, and what kinds of behavior definitely require richer models such as stacks, external memory, or unbounded belief states).Corollary 3.14 (Realizability of deterministic reactive finite-memory specifications by FSRAs)Suppose  be a deterministic behavior that is reactive (i.e. length-preserving and prefix-preserving) and has the finite-memory property. Define the induced specification .Then there exists a Finite-State Reactive Agent (FSRA)  such thatIn particular,  is realizable by the singleton controller set  Theorem 3.15 (Strict Separation of Realizability and Enforceability for FSRAs)There exist specifications  and  over a one-symbol observation alphabet  and action alphabet  such that: is strongly enforceable by the set of all FSRAs, but  is not realizable by any set of FSRAs (finite or infinite). is weakly enforceable by a finite set of FSRAs, but  is not strongly enforceable by any finite set of FSRAs.Proof.We treat the two constructions separately.Part 1: Construction of Let  and . For each  write  for the length- observation string. Define the length- action , the unique length- bitstring whose single 1 appears at position . Define the specificationThus, for every observed prefix  the specification allows exactly the single action .(i) Strong enforceability by the set of all FSRAs.For every  there is an FSRA  (With  states) that outputs  on the first  steps and outputs 1 on the -th step. Concretely,  simply counts up to  and emits the described pattern. Hence for each allowed pair  there exists an FSRA (namely ) that produces that exact pair. By definition (Def. 2.11) the set of all FSRAs therefore strongly enforces .(ii) Non-realizability by any set of FSRAs.Suppose  is any set (finite or infinite) of FSRAs and that  realizes , i.e.Then in particular every pair  belongs to the right–hand side, so for every  there exists some  with . But realizability as equality forces more: for every  and every  we must have . Hence for every  and every , . Therefore every  implements the same global map  defined by . In particular,  must contain at least one FSRA whose induced function is equal to .But the function  above is not implementable by any FSRA: the equivalence relation  has infinite index because distinct prefix lengths are distinguished by future behavior; equivalently  is not finite-memory and so (by Theorem 3.13) not implementable by any FSRA. Thus, no FSRA in  can induce , a contradiction. Therefore, no set  of FSRAs realizes .Part 2 – Construction of Again let  and . For each  setthe set of length- bitstrings that contain exactly  ones.(i) Weak enforceability by a finite set.Let  be the simple 2-state FSRA that ignores the input symbol and alternates outputs  (a 2-cycle). For every  that controller’s length- output contains exactly  ones (the ones appear at the even positions). Hence  for every . Therefore, the singleton finite set  satisfies the weak enforceability condition: for every observed history  there exists a FSRA in this finite set whose produced action at  is allowed.(ii) Failure of strong enforceability by any finite set.Fix any finite FSRAs set . For a given length  the set  can produce at most  distinct length- action strings (each FSRA gives one string of length ). But the size of the fiber  equals the binomial coefficient , which grows super-polynomially in . Thus, for sufficiently large  we have . Hence  cannot contain FSRAs producing every allowed string in . Since strong enforceability would require that every allowed action at each history be produced by some FSRA in the set (Def. 2.11), no finite  can strongly enforce . Example (FSRA – Remainder mod k of the binary stream)Setup: Fix integer . Observation Alphabet  (bits observed); action alphabet  (we let each output symbol be the remainder, for readability). Define deterministic reactive  by: for input , the output at step  is the remainder of the integer represented by the prefix  (interpreting the bit string in binary, most-significant-bit first) modulo . Equivalently letand define  (a string of symbols in ). A small working example: () Input .  Output  Finite Memory: Consider the relation  on prefixes:Claim:  iff the current remainders after  and  are equal, i.e. , where  denotes the integer value of  If  then for any continuation  the recurrence  depends only on  and , so . Hence .Conversely, if , take  to be a single symbol  which distinguishes the remainders by the recurrence. So , hence .Therefore, the equivalence classes are exactly the  residue classes modulo . So  has finite index  – finite memory. FSRA: Define  by (the residue classes)..For state  and observation :After reading a prefix whose current remainder is , seeing bit  updates the remainder to  and the FSRA emits that new remainder as the action for that timestep. This is deterministic, length-preserving, and causal.For example: , input :start from .read 1:  emit read 0: , emit .read 1: , emit .Output: Finite-State Reactive Agent for binary remainder mod 3.Specification: where   and For every observation prefix , the unique allowed action history is the sequence of remainders obtained by repeatedly updating the current remainder by  for each next bit . Thus So . Example (DFA and FSRA – Absence of a substring)Let  which is the collection of all finite strings over  with no two consecutive  symbols. The empty string  We first see a DFA  for it, to be specific a minimal DFA (i.e. the smallest (with respect to number of states) DFA which accepts  )States  where: = “no  at the end / start state” (last symbol was  or empty), = “last symbol was  and no forbidden  has occurred so far”, = dead/trap state (we have seen substring ).Start state: .Accepting states:  (we accept as long as  has not occurred).Transition function : Minimal DFA detecting absence of substring “11”Setup: Observations  (each step the agent observes a bit). Actions (at each step the agent emits 1 iff the prefix seen so far is in , else 0). Define the deterministic reactive function  by: for input , the -th output symbol  equalSo, Finite Memory: The relation  has exactly three equivalence classes, so  has finite memory. (We have proven our claim in the appendix)Define three candidate classes by partitioning all prefixes  into three sets: (include  here). (the dead/trap class).These three sets are disjoint, and their union is all prefixes. Note: We are using  here and in the proof given in the appendixEvery prefix belongs to one of the three classes , each class is contained in a single -equivalence class, and the three classes are pairwise distinct. Therefore  has exactly three equivalence classes (finite index). By Definition 2.8 (finite-memory),  has finite memory. As a corollary, any FSRA implementing  requires at least 3 states (one per  class); the standard construction produces the 3-state FSRA. FSRA: Construct FSRA  directly using the DFA states. as above. is the start state. (same as DFA): (emit 1 iff the new state after reading the input is accepting): For example, input .Start in .Read 1: new state , emit  because 1 alone is in . (Output so far: 1.)Read 0: new state , emit  because  Output: 11.Read 1: new state , emit  because . (Output: 111.)If we had input 11 then at second step:Start . read 1 → , emit 1.Read 1 → , emit 0. So, outputs 10 indicating the prefix of length 2 is rejected.FSRA detecting absence of substring “11”Specification: The realized specification is the singleton-valued relationFor each observation prefix , the fiber  is the unique action history produced by the FSRA after prefix .   Example (DFA and FSRA – Constraint on Count and Ending)Fix , observation alphabet . Fix indices , integers . Define(By convention the empty string does not end in  and thus, is not in  unless you explicitly want to treat it otherwise.) We build a deterministic finite automaton that keeps two pieces of summary information:the residue  equal to the current count of  modulo , andthe last seen symbol , where  denotes “empty / no last symbol yet” (useful for the initial state).Define. So .Start state  (zero occurrences seen, no last symbol).Transition  given by𝟙 𝟙 is 1 if  and 0 otherwise.)Accepting set An example,  Setup: We convert the language into a Boolean behavior, which is a convenient FSRA target. Let Observations:  and Actions:  Define  by: for any input  and index 𝟙So  is the binary stream indicating, step-by-step, whether the observed prefix satisfies the two constraints (count-of- and prefix ends with Finite Memory: The classes of  are precisely the DFA states. In particular  has at most  classes, so  is finite memory. Using the similar proof technique like the previous example we can show that index of  is .FSRA: We now give the Finite-State Reactive Agent  that implements the indicator function  above. (same as DFA states). (per-step indicator).Start .Transition  (state update) is the same as the DFA:𝟙Output function 𝟙Take , alphabet . Let  , . Let  and . So, we require  ends with d.Start . Consider input.Read : new residue , last symbol b. New state . Since  but last symbol , (). Output so far 0.Read :  stays 1, last  → state . Not . Output 0. Outputs 00.Read : , last  →  → output 0. Outputs 000.Read : (r=2), last  →  → output 0. Outputs 0000.Read : (r=2), last  → . Since  output 0. Final outputs: 00000.Now consider appending four more  then : continuation .After appending  the residue increases by 4 → new residue .After final  the residue remains , last symbol d, and outputs 1.  ConclusionThinking of agents as simple finite-state machines helps clarify what they can and cannot do. It shows which interactive behaviors can be handled using only limited memory, and when more powerful models are truly necessary. This perspective makes the design space clearer: some goals can be achieved with small, structured controllers, while others require fundamentally richer mechanisms.There are several natural directions to explore. We can study what changes if the agent is allowed to use randomness, make delayed actions, or update its behavior over time through learning. Each of these makes the model more realistic, but also raises new theoretical questions. Important technical problems include: (i) understanding how hard it is to decide whether a specification can be enforced with limited memory, (ii) finding the smallest possible controller that satisfies a given specification, etc.These directions connect verification, synthesis, and learning, promising a practical theory for building simple, verifiable reactive agents. AppendixDeterministic Finite Automata and TransducersDefinition A.1 (Deterministic Finite Automaton – DFA) A DFA is a tuple , where:  is a finite set of states  is a finite input alphabet   is the transition function (assumed total i.e. its defined for all possible tuples )  is the start state  is the set of accepting statesA DFA defines a language  (= set of all finite length string formed using the alphabet in the alphabet set ) by the usual acceptance criterion: a string  is accepted if and only if the state reached after reading  from  belongs to .Here is a simple example to understand it. We will construct a DFA  which that reads a binary string and accepts if the number of 1s in the string is even. (And remains in the final state whenever the number of 1s encountered till now is even.)Formally, define the automaton  as follows:States:  where the state encodes whether the agent has seen an even or odd number of 1s so far.Alphabet: Start state:  (before reading anything, the count of 1s is zero, which is even)Accepting states: Transition function :Reading 0 does nothing to the parity:   Reading 1 flips the parity:DFA for Parity of number of 1sGiven any input string  , the DFA starts in  and updates its state step-by-step according to . The string is accepted if the final state lies in , i.e., if the total number of 1s in  is even.For example:Input 1010 → two 1s → ends in  → acceptedInput 111 → three 1s → ends in   → rejectedYou can view this DFA as a minimal agent with memory:The state is the agent’s internal memory (here just one bit: even vs odd).The alphabet is the agent’s observation space.The transition function is the agent’s update rule.The accepting states encode a goal or success condition. It already captures the core agent loop: observe → update internal state → decide success/failure. Definition A.2 (Mealy Machine – A Deterministic Finite Transducer) A Mealy machine is a tuple , where: is a finite set of states    is a finite input alphabet    is a finite output alphabet    is the (total) transition function  is the output function is the start stateOn input  starting from , the machine produces the output:where the sequence of states   is defined by .In a Mealy machine, each transition produces an output.The output generated at each step depends on both the current state and the current input symbol and is produced synchronously with the consumption of that input symbol.Thus, a Mealy machine defines a functionthat maps every input string to an output string computed incrementally during the run of the machine.Equivalently, the transition and output functions can be combined into a single functionwhereThis formulation emphasizes that output is produced on transitions, which is why Mealy machines are also called deterministic finite-state transducers. Now, let’s discuss a small example. We construct a Mealy machine that, on each input symbol:outputs 0 when it reads 1outputs 11 when it reads 0Thus, the machine maps any binary input string to a binary output string in which every 1 appears in pairs, so the total number of 1s in the output is always even.Define  by: (a single state, the machine is memoryless) (input alphabet) (output alphabet) (stay in  on every input) (output 0 on input 0, output two 1’s on input 1)Equivalently you can combine  and  into one  with   Notation: Input/output. Mealy Machine to output double 1s for each 1 in input. Here is an example run: Input: . State sequence:  (always ) Output (stepwise): Concatenate to get . Count of 1’s in  = = even.This Mealy machine produces strings in the languageEvery string in  has an even number of 1’s, but not every binary string with an even number of 1’s is in  . For example the string 101 has two 1’s (even) but is not in . So the machine guarantees parity but does not generate all even-1 strings.If your goal is all even-1 strings, you need a different transduction strategy (for example allowing outputs ε or single 1’s at some steps plus internal bookkeeping so the machine can place 1’s arbitrarily while keeping total parity even). That requires either a larger state space or different output rules. The Mealy machines defined above allow a single observation to trigger a variable-length burst of output. While this generality is convenient for describing transductions, it is too permissive for modeling agents that must performs a single action at each step in real time.In an online agent setting, we might want a stricter notion: at each timestep, the agent observes exactly one symbol and immediately emits exactly one action. This synchrony ensures that the agent’s behavior is length-preserving.To capture this reactive, per-timestep interaction pattern, we now introduce the standard Mealy machine, obtained by restricting the output function to produce a single symbol per input symbol. Definition A.2′ (Standard Mealy machine) A standard Mealy machine is a tuple  where all components are as in Definition A.2 except the output function is  (a single output symbol per input symbol). On input , the output is ), hence .  Think of this Mealy machine as a tiny reactive agent:Observation space  is what the agent sees at each timestep.Action space  is what it can emit; here actions can be length-2 strings (so actions may produce multiple primitive outputs in one step).The internal state  is the agent’s memory (here trivial: a single state, so no memory).The pair  is the agent’s policy + state-update rule. On every observation the agent immediately emits output determined by .In this example the agent’s policy enforces a global constraint (even parity of 1’s) by a local rule (“whenever you see a 1, emit 11”). This shows two useful points for thinking about agents:Global invariants from local rules: A simple per-step policy can guarantee a nontrivial global property of the interaction history (here: even parity). Expressivity vs constraints: The policy here is conservative it enforces parity but at the cost of reduced expressivity (not all acceptable-looking outputs are reachable).  Definition A.3 (Moore Machine) A Moore machine is a deterministic finite-state transducer given by a tuple , where all components are as in the Mealy machine definition except that the output function is , depending only on the current state.On input , the machine produces output:where  for  (with  the initial state). Equivalently, some conventions output  as the first symbol, either convention yields equivalent expressiveness.Mealy and Moore machines are equivalent in expressive power: every Mealy machine can be converted to an equivalent Moore machine with at most a linear increase in the number of states, and vice versa.  Recall the Mealy machine from the previous example, which on input 0 outputs 0 and on input 1 outputs 11, thereby ensuring that the total number of 1’s in the output is always even.We now construct a Moore machine that produces exactly the same input–output behavior.Formally, define  as follows:States: Input alphabet: Output alphabet: Initial state: Transition function (): Output function (): A Moore Machine which is equivalent to the Mealy Machine in previous exampleThus:whenever the machine enters state , it outputs 0,whenever it enters state , it outputs 11.For example, on input , the state sequence is ,and the output is .As in the Mealy case, each input symbol 1 contributes exactly two 1’s to the output, and each 0 contributes none. Hence every output string has an even number of 1’s.This Moore machine implements the same transduction as the Mealy machine in the previous example, but the output is associated with states rather than transitions. The extra state  is needed to remember which output should be produced after reading the current input symbol. This illustrates the general transformation from Mealy to Moore machines: outputs that depend on  in a Mealy machine are encoded into states whose labels determine the output in a Moore machine.Moore machines label states with outputs rather than labeling transitions. The most general Moore transducer can emit a short burst of output when a state is entered, but that extra generality is unnecessary when modeling agents that act once per timestep. To model a reactive, step-by-step agent it’s clearer to assume each entered state produces exactly one action symbol. This synchronous restriction  gives the standard Moore machine.Definition A.3′ (Standard Moore machine) A standard Moore machine is a tuple  where all components are as in Definition A.3 except the output function is . On input , the output is . From an agent viewpoint, Moore machines correspond to agents whose actions depend only on their current internal state, not directly on the most recent observation. The observation updates the state via , and the action is then emitted as a function of that state alone. Compared to Mealy machines, Moore machines enforce a stronger separation between state update and action selection. This distinction mirrors a common design choice in AI systems: whether actions are allowed to depend directly on raw observations, or only on a processed internal representation. The equivalence between Mealy and Moore machines suggests that this is largely a representational choice rather than a fundamental difference in capability. Remaining ProofsProof for non-finite memory of For each integer  let  (The prefix consisting of  ones), so . We show  and  are not -equivalent whenever .WLOG . Let  (nonzero). Choose an integer  large enough that(Existence: for fixed nonzero , the equation  implies ; this has only finitely many integer solutions, so for sufficiently large  the second condition holds.)Set  and take continuation  (i.e.  more ones). Then, a perfect square, so the output bit at the final step of  (and indeed one of the outputs in the run) is ., which by choice of  is not a square, so the corresponding output bit in that position is .Thus  and  differ, so . Because there are infinitely many , the relation  has infinitely many equivalence classes. Hence  does not have the finite-memory property.Proof for finite memory of :If two prefixes lie in the same class, then they are -equivalent. Take  in the same class, we show  Case 1 (): If a prefix already contains 11, then every extension also contains 11. Hence for any continuation , every longer prefix of  (from the point where 11 first appeared onward) is not in the language, and the same holds for . Concretely, the per-prefix membership bits for  and  coincide at every position (they may differ on earlier positions before the first 11, but those are identical by assumption of same class, both have already passed the 11 event). Thus .Case 2 (): Neither  nor  contains 11, and both end in 0. From that point onward, whether a future prefix  is in the language depends only on the continuation ) and the fact that the last seen symbol before  is 0. Since both  and  share this same “local summary” (no 11 seen, last symbol 0), for every continuation  the sequence of future per-prefix membership bits after position  depends only on  and not on the internal details of  or . Also the membership bits up to  and  are the same shape relative to their class. Hence .Case 3 (): Similar argument: neither contains 11 and both end with 1. Future membership after appending  depends only on the fact “no 11 so far and last symbol = 1” and on . Therefore . So, each class is contained in a single -equivalence class. We now show that for any two distinct classes there exists a continuation that makes the two resulting outputs differ; this proves they are different -equivalence classes.Distinguish  Let  and . Consider the one-symbol continuation . Then:  contains 11 (since u ended with 1), so the output bit at that step is 0 (the prefix ending at that position is not in the language). does not contain 11 (since v ended with 0), so the output bit at that step is 1.Thus  and  differ at the last position, so .Distinguish : Let  and . Take continuation  (a block of  zeros) of any positive length. For every extension , since 11 already occurred inside , none of the prefixes at or after the position where 11 occurred will be in the language, in particular, the outputs at and after position  are zeros. But for  (where v has not seen 11 and ends with 0), all those extended prefixes continue to avoid 11 and so the outputs at these positions are ones. Thus  and  differ (for instance, at position , so . Distinguish : Similar: take . For , outputs after  are zeros; for  and suitable  (even  suffices), some prefixes of  will still avoid 11 and so produce ones. Concretely,  keeps no 11 and so has output 1 at that position, while  has output 0. So Hence the three candidate classes are pairwise distinguishable, so they correspond to three distinct -classes. Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *