The Fifth Fourth Postulate of Decision TheoryIn 1820, the Hungarian mathematician Farkas Bolyai wrote a desperate letter to his son János, who had become consumed by the same problem that had haunted his father for decades:”You must not attempt this approach to parallels. I know this way to the very end. I have traversed this bottomless night, which extinguished all light and joy in my life. I entreat you, leave the science of parallels alone… Learn from my example.”The problem was Euclid’s fifth postulate, the parallel postulate, which states (in one of its equivalent formulations) that through any point not on a given line, there is exactly one line parallel to the given one. For over two thousand years, mathematicians had felt that something was off about this postulate. The other four were short, crisp, self-evident: you can draw a straight line between any two points, you can extend a line indefinitely, you can draw a circle with any center and radius, all right angles are equal. The fifth postulate, by contrast, was long, complicated, and felt more like a theorem that ought to be provable from the others than a foundational assumption standing on its own. Generation after generation of mathematicians attempted to derive it from the remaining four and failed.Farkas Bolyai begged his son to stay away.János ignored his father’s advice, but not in the way Farkas feared. Instead of trying to prove the postulate, he asked a question that turned the entire enterprise upside down: what happens if the postulate is simply false? What if you can draw more than one parallel line through a point? Rather than deriving a contradiction (which would have constituted a proof of the fifth postulate by reductio), he found something a perfectly consistent geometry, as internally coherent as Euclid’s, just describing a different kind of space. Lobachevsky independently reached the same conclusion around the same time. The parallel postulate was not wrong, exactly, but it was not necessary. It was one choice among several, and the other choices led to geometries that were not merely logically valid but turned out, a century later, to describe the actual physical universe better than Euclid’s flat space ever could.Roughly two centuries later, people were discussing decision theories and axioms of expected utility. The standard argument went roughly like this: rational agents must maximize expected utility. The von Neumann-Morgenstern theorem proves it. If your behavior violates the axioms, you can be Dutch-booked, turned into a money pump, exploited by anyone who notices the inconsistency. You don’t want to be a money pump, do you? Then you must maximize expected utility. QED.There are four axioms in the von Neumann-Morgenstern framework: completeness, transitivity, continuity, and independence. Three of them are relatively uncontroversial. The fourth, independence, does enormous structural work, it is the axiom that forces preferences to be linear in probabilities, which is mathematically equivalent to requiring that preferences be representable as the expected value of a utility function. Without independence, you still have a well-defined preference functional (by Debreu’s theorem, given the other axioms), you can still order outcomes, you can still make consistent choices, but you are no longer constrained to maximize expected utility specifically. Independence is the fifth postulate of decision theory.And just as with Euclid’s fifth, I believe, the resolution is not to keep trying harder to justify it but to ask: what happens when we drop it? What does the resulting decision theory look like? Is it consistent? Is it useful? Does it perhaps describe actual rational behavior better?The answer, I will argue, is yes on all three counts. Dropping independence does not lead to irrationality or exploitability. Several well-known alternatives to expected utility theory exist precisely because they relax independence, and they do so for a reason. Ergodicity economics, in particular, offers a principled and parsimonious replacement that derives the appropriate evaluation function from the dynamics of the stochastic process the agent is embedded in, rather than postulating an ad hoc utility function and taking its expectation. And the LessWrong community’s own research into updateless decision theory has been converging on the same conclusion from a completely different direction: that the most reflectively stable agents may be precisely those who violate the independence axiom.A Tale of Two UtilitiesBefore we get to the main argument, we need to clear up a terminological confusion that silently corrupts reasoning about decision theory on the most trivial level. The word “utility” refers to two completely different mathematical objects, and the fact that they share a name is sad. This is well known in decision theory and you are welcome to skip this section if you know what I am talking about. The first object is what we might call preference utility, or f1. This is the function that economists use in consumer theory to represent your subjective valuation of bundles of goods under certainty. If you are indifferent between (2 oranges, 3 apples) and (3 oranges, 2 apples), then f1 is constructed so that f1(2,3) = f1(3,2). The crucial property of f1 is that it is ordinal: the only thing that matters is the ranking it induces, not the numerical values it assigns. If f1 assigns 7 to bundle A and 3 to bundle B, all that means is that you prefer A to B. You could replace f1 with any monotonically increasing transformation of it (squaring it, taking its exponential, adding a million) and it would represent exactly the same preferences. The numbers themselves carry no information beyond the ordering.The second object is von Neumann-Morgenstern utility, or f2. This is the function that appears inside the expectation operator in expected utility theory. It is constructed not from your preferences over certain bundles but from your preferences over lotteries, over probability distributions on outcomes. The vNM theorem says: if your preferences over lotteries satisfy the four axioms, then there exists a function f2 such that you prefer lottery A to lottery B if and only if E[f2(A)] > E[f2(B)]. Unlike f1, f2 is cardinal: it is defined up to affine transformation (you can multiply it by a positive constant and add any constant, but that’s all). Its curvature carries real information, specifically about your attitudes toward risk. A concave f2 means you are risk-averse; a convex one means you are risk-seeking. This curvature is not a feature of f1 at all, because f1 is defined up to arbitrary monotone transformation, which can make the curvature anything you want.Now, f2 must agree with f1 on one thing: the ranking of certain (degenerate) outcomes. If you prefer bundle A to bundle B with certainty, then f2(A) > f2(B), just as f1(A) > f1(B). But f2 contains strictly more information than f1. It tells you not just that you prefer A to B, but how much you prefer A to B relative to other pairs, in the precise sense that these ratios of differences determine what gambles you would accept. f1 says nothing about gambles at all.This distinction is treated in the theoretical literature (see e.g. Mas-Colell, Whinston, and Green, Microeconomic Theory, Chapter 6, which makes the distinction explicit, or Kreps, Notes on the Theory of Choice, which provides a particularly careful treatment). But in practice, in textbooks, in casual discussion, the two get conflated constantly. People say “utility function” without specifying which one they mean, and the ambiguity does real damage.Here is the specific confusion that matters for our purposes. When someone says “a rational agent maximizes expected utility,” this sounds, to a casual listener, like it means “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighted sum of how much they value each possible result.But this is only true if f1 and f2 are the same function, and they are generally not. They coincide only in the special case where the agent’s risk attitudes happen to perfectly match the curvature of their subjective value function, which is to say, only when the agent treats each possible world as independently valuable and sums across them with no regard for the structure of the gamble as a whole. There is no reason to expect this, and empirically it does not hold.Why does this matter for what follows? Because before addressing serious arguments for EUT, I want to address “Argument 0” – that EUT is good because it averages subjective utilities over possible worlds, for it is invalid.Independence Is Sufficient but Not Necessary for Avoiding ExploitationThe strongest case for independenceLet’s steelman the argument for the independence axiom. The best argument does not come from raw intuition (“of course irrelevant alternatives shouldn’t matter!”), but, in my view, from a 1988 result by Peter Hammond, and it goes like this.Consider an agent facing a decision that unfolds over time, in stages. At stage one, some uncertainty is resolved (say, a coin is flipped). Depending on the result, the agent proceeds to stage two, where they must choose between options. Before any uncertainty is resolved, the agent can form a plan: “if the coin comes up heads, I will do X; if tails, I will do Y.” Hammond showed that if you accept two properties of sequential decision-making, then you are logically forced to satisfy the independence axiom.The first property is dynamic consistency: whatever plan you make before the uncertainty is resolved, you actually follow through on once you arrive at the decision node. Your ex ante plan and your ex post choice agree.The second property is consequentialism (in the decision-theoretic sense, not the ethical one): when you arrive at a decision node, your choice depends only on what is still possible from that node forward. If you accept both properties and you violate independence, you can be money-pumped. Here is how it works, concretely. Suppose your preference between gambles A and B depends on what the common component C is (as the independence axiom says it shouldn’t). Before the uncertainty resolves, you evaluate the compound lottery holistically and prefer the plan involving B (because, in combination with the C branch, B produces a better overall distribution). But then the coin comes up heads, the C branch is now off the table, and you find yourself choosing between A and B in isolation. Consequentialism says you should evaluate based on what’s still possible. And in isolation, you prefer A. So you switch from your plan (B) to your current preference (A). You are dynamically inconsistent.A clever adversary who knows your preferences can now exploit this. They offer you a sequence of trades: pay a small amount to switch from plan-A to plan-B before the coin flip (because ex ante you prefer B in context), then after the coin lands heads, pay a small amount to switch from B to A (because ex post you prefer A in isolation). You have paid twice and ended up exactly where you started. Sufficiency, not necessityThe argument above is valid. But notice its logical structure very carefully. Hammond proved:Dynamic consistency + Consequentialism → IndependenceThis means independence is entailed by the conjunction of dynamic consistency and consequentialism. It does not mean independence is the only way to avoid money pumps. Dynamic consistency alone is what prevents exploitation (if you always follow through on your plans, no one can pump you by getting you to switch mid-stream). And the Hammond result shows that dynamic consistency together with consequentialism implies independence, but this leaves open a crucial possibility: what if you maintain dynamic consistency while giving up consequentialism?In that case, you can violate independence and still be immune to money pumps. The money pump relies on a specific sequence of events: first, you form a plan; then, partway through, you deviate from it because your local evaluation at the intermediate node (which, under consequentialism, ignores the branches that didn’t happen) differs from your global evaluation when you made the plan. If you simply don’t deviate, if you stick to your plan regardless of what your local preferences at intermediate nodes might suggest, the pump has no lever to pull. The adversary offers you a trade mid-stream, you say “no, I committed to a plan and I’m executing it,” and the pump breaks down.It is a well-developed position in decision theory, and it comes in (at least) two flavors.Resolute choiceEdward McClennen developed the theory of resolute choice in his 1990 book Rationality and Dynamic Choice. The idea is straightforward: an agent evaluates the entire decision tree before any uncertainty is resolved, selects the plan that is globally optimal over the full trajectory, commits to it, and then executes it step by step without re-evaluating at intermediate nodes.The cost is giving up consequentialism. At some intermediate node, the resolute chooser may be executing an action that looks suboptimal if you consider only what’s still possible from that node forward. They are choosing it because it was part of the globally optimal plan, and the globally optimal plan was evaluated over the entire tree, including branches that, at this point, have already been resolved. Is this “irrational”? I don’t think so. It is the same thing that anyone does when they follow through on a commitment that has become locally costly but was globally optimal at the time it was made.Sophisticated choiceThere is a second alternative, which goes in a different direction.[1] A sophisticated chooser accepts that their preferences at future nodes will differ from their current global evaluation, and instead of committing to override those future preferences, they predict them and plan around them. They do backward induction: starting from the last decision node, they figure out what they would actually choose there (given their local preferences at that node), then step back one node and choose optimally given what they know they will do later, and so on back to the first node.The sophisticated chooser is also immune to money pumps, because they never form a plan that they will later deviate from. Instead, they form a plan that already accounts for their future deviations. The cost is different from resolute choice: instead of sticking to a globally optimal plan despite local temptation, the sophisticated chooser settles for a plan that may be dominated from the ex ante perspective but is at least self-consistent in the sense that they will actually follow through on it.Sophisticated choice is less elegant than resolute choice, and for our purposes less interesting, but it is worth mentioning because it demonstrates the same structural point: money-pump immunity does not require independence. It just requires some form of sequential coherence (either commitment or self-prediction), and independence is only one way to get there, indeed the most restrictive way.Ergodicity economics as a naturally resolute frameworkI closely follow the work of Ole Peters and collaborators and believe it is very cool. There is, unfortunately, a lot of confusion, and it is not framed in terms of decision theory apparatus, but that is what I am going to do precisely now. An agent who maximizes the time-average growth rate of their wealth over their entire trajectory is, I claim, doing resolute choice in the sense McClennen described. They evaluate the whole plan, the entire sequence of bets, the complete wealth process, as a unified object. They ask: “given the dynamics of this stochastic process, what strategy maximizes my long-run growth rate?” And then they execute that strategy.The “utility function” that falls out of this procedure (via the ergodic mapping, which finds the transformation that renders the wealth process ergodic, so that time and ensemble averages coincide) depends on the dynamics of the process. For multiplicative dynamics, you get logarithmic utility (the Kelly criterion). For additive dynamics, you get linear utility. For more exotic dynamics, you get whatever transformation the ergodic mapping produces. This means the effective utility function is context-dependent: it changes when the stochastic environment changes. And context-dependence of the utility function is precisely what the independence axiom forbids, because independence says your preference between sub-gambles should not depend on what else is in the package.So the EE agent violates independence. But are they exploitable? No. And the reason maps exactly onto the resolute choice framework. The EE agent has committed to a trajectory-level optimization: maximize time-average growth. They don’t re-evaluate at intermediate nodes by asking “given that this branch of the uncertainty has been resolved, what do my local preferences say?” They continue executing the trajectory-level strategy because it was derived from a global evaluation of the entire process. The money pump has no leverage because there is no gap between the agent’s ex ante plan and their ex post behavior. They planned to Kelly-bet (or whatever the ergodic mapping prescribes), and they are Kelly-betting, regardless of what the local branch structure looks like at any given moment.This connection between ergodicity economics and resolute choice has not been articulated before. But it is, I think, the cleanest way to see why EE can violate independence without irrationality.Now, you may or may not accept the entire EE program, but, at the very minimum, I think the conclusion that the agent should pay attention to the dynamic of the gamble and that concrete “utility function” should depend on the gamble is undeniably valid. The broader landscapeThe fact that independence is the weak point of the vNM framework is reflected in the structure of the entire field of generalized decision theory, where the majority of alternative frameworks are built specifically by relaxing or replacing the independence axiom:Rank-dependent utility (Quiggin, 1982) replaces independence with “comonotonic independence” (independence holds only for gambles that rank outcomes in the same order). The result is a preference functional that includes a probability weighting function, which distorts the cumulative distribution before integrating against the utility function. Cumulative prospect theory (Tversky and Kahneman, 1992) combines probability weighting with reference-dependence and loss aversion. It was developed to explain empirical patterns of choice under risk, and it violates independence in multiple ways.Quadratic utility (Chew, Epstein, and Segal) allows the preference functional to be a bilinear form in probabilities, meaning it is quadratic rather than linear in the probability measure. This captures something akin to sensitivity to the variance of a gamble, not just its mean.Betweenness preferences (Dekel, 1986; Chew, 1989) weaken independence to the requirement that if you are indifferent between two lotteries, any mixture of them is equally good. This is strictly weaker than full independence and yields preference functionals defined by implicit functional equations rather than explicit integrals.This convergence is not coincidental. When multiple independent research programs, developed by different people with different motivations over several decades, all arrive at the same structural move (relax independence), it suggests that the constraint being relaxed is objectively too strong. Allais and Ellsberg Behavior Is RationalAllais ParadoxThe Allais paradox is the oldest and most famous demonstration that people systematically violate the independence axiom. The setup, in its simplified form, goes like this.In situation one, you choose between gamble A (certainty of one million euros) and gamble B (89% chance of one million, 10% chance of five million, 1% chance of nothing). Most people choose A.In situation two, you choose between gamble C (11% chance of one million, 89% chance of nothing) and gamble D (10% chance of five million, 90% chance of nothing). Most people choose D.But the move from situation one to situation two is exactly a common-consequence substitution: you strip out the same 89% component from both options in each pair. Independence says this shouldn’t change your preference, so if you chose A over B, you should choose C over D. People do the opposite, and this is treated as evidence of irrationality, a “paradox” revealing that human risk cognition is systematically biased.I want to argue that it is not a paradox at all. It is rational behavior that only looks paradoxical if you insist on evaluating each branch of a lottery independently of every other branch, which is exactly what the independence axiom demands and exactly what a holistic reasoner should not do.Consider why people choose A in situation one. The certainty of one million is qualitatively different from a 99% chance of getting at least one million with a 1% chance of getting nothing. That 1% of nothing looms large because of what it means in context: you are giving up a sure million for a gamble that could leave you with nothing. The certain outcome provides a floor, a guaranteed trajectory, and evaluating the gamble requires considering what happens along the entire trajectory, including the branch where you get nothing while knowing you could have had a certain million.Now consider situation two. Both options involve a high probability of getting nothing. There is no certainty to give up, no floor to sacrifice. The context has fundamentally changed: you are already in a world where you will probably get nothing, and the question is just whether to take a slightly higher probability of a moderate payout or a slightly lower probability of a much larger one. In this context, going for the higher expected value is sensible.The shift from A-over-B to D-over-C is a rational response to the fact that the overall risk structure of the gamble has changed. The “common component” (the 89% that was stripped out) was not psychologically or strategically inert: in situation one, it was providing certainty; in situation two, it was providing nothing. Stripping it out changed the context in which the remaining options are evaluated, and a holistic reasoner, one who evaluates their total exposure rather than decomposing gambles into independent branches, should respond to that change.This is precisely the point we made with the example in the introduction to section 3. If the common component C is a large safety net, you can afford to take more risk on the remaining branch. If C is negligible, you should be more conservative. Your preference between A and B should depend on what else is in the package, because you are one agent facing the total distribution, not a collection of independent sub-agents each evaluating one branch in isolation.The important distinction here is between the descriptive claim and the normative claim. The descriptive claim (people violate independence in the Allais pattern) has been known since 1953 and is not controversial. What is usually controversial is the normative status of this behavior. The standard treatment in economics and in much of the rationality community says: people violate the axiom, this is a bias, ideally they should be corrected. The position I am defending is the opposite: people violate the axiom because the axiom is too strong, their behavior reflects a rational holistic evaluation of the gamble’s structure, and the “correction” (forcing independence-compliant preferences) would make them worse decision-makers, not better ones.Ellsberg ParadoxThe Ellsberg paradox involves a related but distinct phenomenon: ambiguity aversion. The classic setup: an urn contains 30 red balls and 60 balls that are either black or yellow in unknown proportion. You can bet on the color of a drawn ball. Most people prefer betting on red (known probability of 1/3) over betting on black (unknown probability, could be anything from 0 to 2/3), even though if you assign your best-estimate probability of 1/3 to black, the expected values are identical. This is typically treated as another “irrational” bias: the probabilities are the same in expectation, so why should ambiguity matter?Ergodicity economics provides a natural and I think quite elegant resolution, and it comes in two layers.The first layer is a direct Jensen’s inequality argument. Under multiplicative dynamics, the time-average growth rate of a repeated gamble is a concave function of the probability. For a simple multiplicative bet with fraction f of wealth wagered, the growth rate is something like g(p) = p·log(1+f) + (1-p)·log(1-f), which is concave in p.Now consider the Ellsberg urn. The number of black balls could be 0, 1, 2, …, 60. If you are maximally uncertain and average uniformly over these possibilities, the expected proportion is 30/60 = 1/2, which matches the known-probability case. An ensemble-average reasoner sees no difference: E[p] = 1/2 in both cases, so the expected value of the gamble is the same.But concavity of g in p means that Jensen’s inequality applies:E[g(p)] < g(E[p])The average time-average growth rate across all possible urn compositions is strictly less than the time-average growth rate you get when the probability is known to be 1/2. Each distinct urn composition (0 black balls, 1 black ball, 2 black balls, and so on) defines a different multiplicative process with a different time-average growth rate. You can compute all 61 of these growth rates and average them, and that average will be strictly lower than the single growth rate corresponding to the known 1/2 probability, because you are averaging a concave function. The gap is mathematically inevitable, and it is completely invisible to ensemble averaging.The second layer is about strategic optimality. Even beyond the Jensen’s inequality point, an agent under multiplicative dynamics has a further reason to prefer known probabilities: strategy calibration. The optimal strategy (the Kelly fraction, or more generally whatever the ergodic mapping prescribes) depends on the probabilities. When probabilities are known, you can tune your bet size precisely and achieve the optimal time-average growth rate. When probabilities are ambiguous, you cannot.The Kelly criterion is uniquely optimal: any deviation from the correct Kelly fraction, whether you bet too aggressively or too conservatively, strictly reduces the time-average growth rate. If the true probability of black is 1/6 and you bet as if it were 1/3, you are over-betting and your growth rate suffers. If the true probability is 1/2 and you bet as if it were 1/3, you are under-betting and your growth rate also suffers, less dramatically but still measurably. Regardless of what the true probability turns out to be, as long as it differs from your point estimate, your trajectory-level performance is strictly worse than what you could have achieved with known probabilities.So the agent who prefers known probabilities is, in effect, saying: “I want to be able to optimize my strategy for the actual stochastic process I am embedded in, and I can only do that if I know the parameters of that process.” How LessWrong Has Engaged with ThisThe LessWrong community has discussed the independence axiom and related questions multiple times over the past fifteen years, and the landscape is instructive. The pieces are mostly there: the right questions have been asked, the right concerns have been raised, and in one remarkable comment, the right conclusion has been stated almost verbatim. But the pieces have never been assembled into a unified argument. Armstrong’s “Expected Utility Without the Independence Axiom” (2009)Stuart Armstrong’s post is, to my knowledge, the earliest serious treatment of dropping independence on LessWrong, and it gets a lot right. Armstrong correctly identifies independence as the most controversial vNM axiom and explores what kind of decision theory remains when you drop it. This was valuable groundwork, and it is to Armstrong’s credit that he took the question seriously at a time when the LessWrong consensus was (and to a significant extent still is) that violating any vNM axiom is ipso facto irrational.However, Armstrong reaches one conclusion that I think is wrong. His central result is that when an agent faces many lotteries, and those lotteries are independent and have bounded variance, the agent’s aggregate behavior converges to expected utility maximization even without the independence axiom. He writes: “Hence the more lotteries we consider, the more we should treat them as if only their mean mattered. So if we are not risk loving, and expect to meet many lotteries with bounded SD in our lives, we should follow expected utility.”This is a correct result within its assumptions, but the assumptions exclude exactly the cases where abandoning independence matters most. Armstrong’s convergence argument relies on two things: that the lotteries are independent of each other, and that they aggregate additively (so that the law of large numbers, in its standard additive form, applies to their sum). Under these conditions, yes, the variance of the aggregate shrinks relative to the mean, and the mean dominates, which is equivalent to expected utility maximization.But for an agent making sequential decisions where wealth compounds multiplicatively, the aggregation is not additive. The relevant law of large numbers for multiplicative processes concerns the geometric mean, not the arithmetic mean. And the geometric mean of a set of multiplicative gambles is determined by the time-average growth rate (the expected logarithm of the growth factor), not by the expected value. The convergence is to the time average, not the ensemble average. The same line of reasoning can be applied to any non-additive (so not only multiplicative) gamble.Scott Garrabrant’s comment (2022) — Updatelessness and independenceIn December 2022, Scott Garrabrant left a comment beneath a post on the EUT that I consider one of the most important things written on LessWrong in the context of this question. I want to quote the core of it and then explain why it matters for my argument.Garrabrant wrote:My take is that the concept of expected utility maximization is a mistake. […] As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently. […] Von Neumann did not notice this mistake because he was too busy inventing the entire field. The point where we discover updatelessness is the point where we are supposed to realize that all of utility theory is wrong. I think we failed to notice.The argument, unpacked, goes like this. The vNM framework, and every axiomatization of utility that Garrabrant is aware of, implicitly assumes updating: when you observe something (say, a coin comes up heads), you condition on that observation and from that point forward you only care about worlds consistent with it. The worlds where the coin came up tails are discarded from your deliberation. This is Bayesian updating applied to preferences, not just beliefs, and it is so deeply embedded in the framework that it is usually invisible.But the LessWrong/MIRI decision theory research program discovered, through work on Updateless Decision Theory and its successors, that updating is not a requirement of rationality. An updateless agent does not narrow its caring when it makes an observation. Now here is the connection which is the reason I am presenting Garrabrant’s comment at length.The updating step that Garrabrant identifies as the hidden assumption in utility theory is, formally, the same thing as the branch-by-branch evaluation that the independence axiom encodes. When you update on “the coin came up heads,” you evaluate your remaining options conditional on this observation, ignoring the tails branch. Independence says this conditional evaluation should be the same regardless of what was on the tails branch, precisely because you are supposed to discard the tails branch after updating. An updateless agent, by contrast, evaluates the entire policy (covering both heads and tails) as a single object, and the value of the heads-branch action depends on what the tails-branch action is, because both are part of the same globally optimized policy.This is structurally parallel to the EE critique: the time-average reasoner evaluates the entire trajectory (all branches, the full compounding structure) as a unified object, rather than decomposing it into independent branches and evaluating each one after updating on which branch was realized. The EE agent is, in Garrabrant’s terminology, updateless with respect to the temporal unfolding of their wealth process.Two completely independent lines of thought, one coming from physics and the mathematics of stochastic processes, the other coming from the philosophical and logical analysis of decision theory within the rationalist community, converge on the same structural conclusion: the independence axiom encodes a branch-by-branch, post-update evaluation that is not required by rationality, and the most reflectively coherent agents are those who evaluate holistically rather than branch-by-branch.Academian’s “VNM Expected Utility Theory: Uses, Abuses, and Interpretation” (2010)Academian’s post covers a lot of ground, but the section relevant to our discussion is section 5, titled “The independence axiom isn’t so bad.”Academian’s defense of independence rests on what he calls the Contextual Strength (CS) interpretation of vNM utility. The idea is that vNM-preference should be understood as “strong preference” within a given context of outcomes. When the vNM formalism says you are indifferent between two options (S = D in the parent-giving-a-car-to-children example), this does not mean you have no preference at all. It means you have no preference strong enough that you would sacrifice probabilistic weight on outcomes that matter in the current context in order to indulge it. Under this interpretation, the independence axiom’s requirement that S = D implies S = F = D (where F is the coin-flip mixture) just means you wouldn’t sacrifice anything contextually important to get the fair coin flip over either deterministic option. You can still prefer the coin flip in some weaker sense; you just can’t prefer it strongly enough to trade off against the things that actually matter.I want to acknowledge that this is a well-crafted defense, and Academian is admirably honest about most of its limitations. But the CS defense has a critical limitation that Academian does not address: it works only for small, contextually negligible independence violations. The parent-and-car example involves a marginal preference for fairness that is, as Academian argues, plausibly too weak to warrant probabilistic sacrifice in a context that includes weighty outcomes. Fine. But the independence violations that arise in the settings this article is concerned with are not marginal at all.Consider again the gamble example from section 3. You are choosing between gambles A and B, and the common component C is either a large safety net (ten million euros) or a trivial amount (five euros). Your preference between A and B flips depending on what C is: with the large safety net, you take the risky option; without it, you take the safe one. This is not a whisper of a preference that disappears when larger considerations are in play, but a robust, large-magnitude shift in risk strategy driven by the structural properties of your total exposure. The CS interpretation cannot accommodate this, because the whole point of CS is that independence violations are contextually negligible, and in the cases that matter for EE and for real-world sequential decision-making, they are anything but.Fallenstein’s “Why You Must Maximize Expected Utility” (2012)Benja Fallenstein’s post is the most rigorous and carefully argued defense of expected utility maximization on LessWrong, and it is the one that most directly claims what the title says: that you must maximize expected utility. If the argument of this article is correct, Fallenstein’s post is where the disagreement is sharpest.Fallenstein’s setup is this. You have a “genie,” a perfect Bayesian AI, that must choose among possible actions on your behalf. The genie comprehends the set of all possible “giant lookup tables” (complete plans specifying what to do in every conceivable situation) and selects the one that best satisfies your preferences. Preferences are defined over “outcomes,” which are data structures containing all and only the information about the world that matters to your terminal values. The genie evaluates probability distributions over these outcomes.Within this setup, Fallenstein argues for independence by analogy with conservation of expected evidence. He writes: “The Axiom of Independence is equivalent to saying that if you’re evaluating a possible course of action, and one experimental result would make it seem more attractive than it currently seems to you, while the other experimental result would at least make it seem no less attractive, then you should already be finding it more attractive than you do.” He then addresses the parent/car/coin counterexample by arguing that if you care about the randomization mechanism, this should already be encoded in the outcome, not in the preference over lotteries.This is a strong argument, and it is correct within its setup. If you accept the timeless-genie framing, where a perfect Bayesian evaluates all possible world-histories simultaneously and chooses among complete plans from a god’s-eye view, then independence is very nearly trivially true. The genie faces a single, static decision over probability distributions. There is no temporal sequence, no compounding, no intermediate node at which the genie might re-evaluate. The genie simply picks the best plan, and the best plan is the one whose probability distribution over outcomes ranks highest. In this setting, asking whether the “common component” should influence the evaluation is like asking whether an irrelevant column in a spreadsheet should affect which row you pick: obviously not, because you’re evaluating the whole row at once.But the force of this argument depends entirely on whether you accept the timeless-genie framing as the correct idealization of rational decision-making. And this is precisely what ergodicity economics and the updatelessness research program both call into question.The genie exists outside of time. It surveys the entire space of possible histories from above, assigns probabilities, and computes weighted sums. This is the ensemble-averaging perspective, formalized as a decision procedure. And it is a perfectly coherent idealization, one of the possible “geometries” of decision theory. But it is not the only one. An agent who is embedded in a temporal process, who faces sequential decisions with compounding consequences, who cannot step outside of time and evaluate all histories simultaneously, lives in a different geometry. For this agent, the temporal structure of the process, the order in which decisions are made, the way outcomes compound, the path-dependence of wealth dynamics are the central features of the decision problem.Fallenstein’s argument shows that if you accept the timeless-genie setup, you get expected utility maximization, not that you must accept the timeless-genie setup. The question that EE raises, whether a temporally embedded agent facing sequential compounding decisions should evaluate trajectories holistically rather than decomposing them into independent branches, falls entirely outside Fallenstein’s framing. It is not addressed because it cannot be addressed within that framing, just as questions about the curvature of space cannot be addressed within Euclidean geometry. You need a different geometry to even ask them.Just Give Up on EUTI think we just need to abandon EUT, once and for all. It is bad to describe humans, bad to describe AIs, and bad to describe potential superintelligences.The argument for this conclusion has three legs, and I want to make sure all three are visible:1. Theoretical. The independence axiom is sufficient but not necessary for avoiding Dutch book exploitation. 2. Empirical. The Allais paradox, the Ellsberg paradox, and the general instability of estimated risk-aversion parameters across contexts are not bugs in human cognition that education or debiasing should correct, buf features which are exactly what you would expect from agents who evaluate their total risk exposure holistically rather than branch by branch. 3. Convergence of independent research programs. The ergodicity economics and the updateless decision theory program, independently and from completely different starting points, converge on the same structural insight: the branch-by-branch, post-update evaluation that the independence axiom encodes is just one possible rational way to face uncertainty, and there are others.The rationalist community has an enormous intellectual investment in expected utility maximization. It is woven into the foundations of how this community thinks about decision theory, about AI alignment, about what it means for an agent to be rational. Eliezer’s sequences treat EU maximization as nearly axiomatic. The VNM theorem is invoked routinely as a constraint on what rational agents can look like. A great deal of alignment-relevant reasoning (about corrigibility, about value learning, about what kinds of objective functions a superintelligent agent would have) implicitly assumes that sufficiently rational agents are EU maximizers. Hence, there is even more grace in pivoting from EUT and acknowledging the problems with the independence axiom. János Bolyai wrote to his father: “Out of nothing I have created a strange new universe.” We shouldn’t be afraid to the same in decision theory. ^The terminology “sophisticated choice” itself was consolidated by Hammond (1976) and especially by McClennen (1990), who contrasted it explicitly with resolute choice. So McClennen’s book is actually the key source that sets up the three-way taxonomy: naive, sophisticated, and resolute.Discuss Read More
On The Independence Axiom
The Fifth Fourth Postulate of Decision TheoryIn 1820, the Hungarian mathematician Farkas Bolyai wrote a desperate letter to his son János, who had become consumed by the same problem that had haunted his father for decades:”You must not attempt this approach to parallels. I know this way to the very end. I have traversed this bottomless night, which extinguished all light and joy in my life. I entreat you, leave the science of parallels alone… Learn from my example.”The problem was Euclid’s fifth postulate, the parallel postulate, which states (in one of its equivalent formulations) that through any point not on a given line, there is exactly one line parallel to the given one. For over two thousand years, mathematicians had felt that something was off about this postulate. The other four were short, crisp, self-evident: you can draw a straight line between any two points, you can extend a line indefinitely, you can draw a circle with any center and radius, all right angles are equal. The fifth postulate, by contrast, was long, complicated, and felt more like a theorem that ought to be provable from the others than a foundational assumption standing on its own. Generation after generation of mathematicians attempted to derive it from the remaining four and failed.Farkas Bolyai begged his son to stay away.János ignored his father’s advice, but not in the way Farkas feared. Instead of trying to prove the postulate, he asked a question that turned the entire enterprise upside down: what happens if the postulate is simply false? What if you can draw more than one parallel line through a point? Rather than deriving a contradiction (which would have constituted a proof of the fifth postulate by reductio), he found something a perfectly consistent geometry, as internally coherent as Euclid’s, just describing a different kind of space. Lobachevsky independently reached the same conclusion around the same time. The parallel postulate was not wrong, exactly, but it was not necessary. It was one choice among several, and the other choices led to geometries that were not merely logically valid but turned out, a century later, to describe the actual physical universe better than Euclid’s flat space ever could.Roughly two centuries later, people were discussing decision theories and axioms of expected utility. The standard argument went roughly like this: rational agents must maximize expected utility. The von Neumann-Morgenstern theorem proves it. If your behavior violates the axioms, you can be Dutch-booked, turned into a money pump, exploited by anyone who notices the inconsistency. You don’t want to be a money pump, do you? Then you must maximize expected utility. QED.There are four axioms in the von Neumann-Morgenstern framework: completeness, transitivity, continuity, and independence. Three of them are relatively uncontroversial. The fourth, independence, does enormous structural work, it is the axiom that forces preferences to be linear in probabilities, which is mathematically equivalent to requiring that preferences be representable as the expected value of a utility function. Without independence, you still have a well-defined preference functional (by Debreu’s theorem, given the other axioms), you can still order outcomes, you can still make consistent choices, but you are no longer constrained to maximize expected utility specifically. Independence is the fifth postulate of decision theory.And just as with Euclid’s fifth, I believe, the resolution is not to keep trying harder to justify it but to ask: what happens when we drop it? What does the resulting decision theory look like? Is it consistent? Is it useful? Does it perhaps describe actual rational behavior better?The answer, I will argue, is yes on all three counts. Dropping independence does not lead to irrationality or exploitability. Several well-known alternatives to expected utility theory exist precisely because they relax independence, and they do so for a reason. Ergodicity economics, in particular, offers a principled and parsimonious replacement that derives the appropriate evaluation function from the dynamics of the stochastic process the agent is embedded in, rather than postulating an ad hoc utility function and taking its expectation. And the LessWrong community’s own research into updateless decision theory has been converging on the same conclusion from a completely different direction: that the most reflectively stable agents may be precisely those who violate the independence axiom.A Tale of Two UtilitiesBefore we get to the main argument, we need to clear up a terminological confusion that silently corrupts reasoning about decision theory on the most trivial level. The word “utility” refers to two completely different mathematical objects, and the fact that they share a name is sad. This is well known in decision theory and you are welcome to skip this section if you know what I am talking about. The first object is what we might call preference utility, or f1. This is the function that economists use in consumer theory to represent your subjective valuation of bundles of goods under certainty. If you are indifferent between (2 oranges, 3 apples) and (3 oranges, 2 apples), then f1 is constructed so that f1(2,3) = f1(3,2). The crucial property of f1 is that it is ordinal: the only thing that matters is the ranking it induces, not the numerical values it assigns. If f1 assigns 7 to bundle A and 3 to bundle B, all that means is that you prefer A to B. You could replace f1 with any monotonically increasing transformation of it (squaring it, taking its exponential, adding a million) and it would represent exactly the same preferences. The numbers themselves carry no information beyond the ordering.The second object is von Neumann-Morgenstern utility, or f2. This is the function that appears inside the expectation operator in expected utility theory. It is constructed not from your preferences over certain bundles but from your preferences over lotteries, over probability distributions on outcomes. The vNM theorem says: if your preferences over lotteries satisfy the four axioms, then there exists a function f2 such that you prefer lottery A to lottery B if and only if E[f2(A)] > E[f2(B)]. Unlike f1, f2 is cardinal: it is defined up to affine transformation (you can multiply it by a positive constant and add any constant, but that’s all). Its curvature carries real information, specifically about your attitudes toward risk. A concave f2 means you are risk-averse; a convex one means you are risk-seeking. This curvature is not a feature of f1 at all, because f1 is defined up to arbitrary monotone transformation, which can make the curvature anything you want.Now, f2 must agree with f1 on one thing: the ranking of certain (degenerate) outcomes. If you prefer bundle A to bundle B with certainty, then f2(A) > f2(B), just as f1(A) > f1(B). But f2 contains strictly more information than f1. It tells you not just that you prefer A to B, but how much you prefer A to B relative to other pairs, in the precise sense that these ratios of differences determine what gambles you would accept. f1 says nothing about gambles at all.This distinction is treated in the theoretical literature (see e.g. Mas-Colell, Whinston, and Green, Microeconomic Theory, Chapter 6, which makes the distinction explicit, or Kreps, Notes on the Theory of Choice, which provides a particularly careful treatment). But in practice, in textbooks, in casual discussion, the two get conflated constantly. People say “utility function” without specifying which one they mean, and the ambiguity does real damage.Here is the specific confusion that matters for our purposes. When someone says “a rational agent maximizes expected utility,” this sounds, to a casual listener, like it means “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighted sum of how much they value each possible result.But this is only true if f1 and f2 are the same function, and they are generally not. They coincide only in the special case where the agent’s risk attitudes happen to perfectly match the curvature of their subjective value function, which is to say, only when the agent treats each possible world as independently valuable and sums across them with no regard for the structure of the gamble as a whole. There is no reason to expect this, and empirically it does not hold.Why does this matter for what follows? Because before addressing serious arguments for EUT, I want to address “Argument 0” – that EUT is good because it averages subjective utilities over possible worlds, for it is invalid.Independence Is Sufficient but Not Necessary for Avoiding ExploitationThe strongest case for independenceLet’s steelman the argument for the independence axiom. The best argument does not come from raw intuition (“of course irrelevant alternatives shouldn’t matter!”), but, in my view, from a 1988 result by Peter Hammond, and it goes like this.Consider an agent facing a decision that unfolds over time, in stages. At stage one, some uncertainty is resolved (say, a coin is flipped). Depending on the result, the agent proceeds to stage two, where they must choose between options. Before any uncertainty is resolved, the agent can form a plan: “if the coin comes up heads, I will do X; if tails, I will do Y.” Hammond showed that if you accept two properties of sequential decision-making, then you are logically forced to satisfy the independence axiom.The first property is dynamic consistency: whatever plan you make before the uncertainty is resolved, you actually follow through on once you arrive at the decision node. Your ex ante plan and your ex post choice agree.The second property is consequentialism (in the decision-theoretic sense, not the ethical one): when you arrive at a decision node, your choice depends only on what is still possible from that node forward. If you accept both properties and you violate independence, you can be money-pumped. Here is how it works, concretely. Suppose your preference between gambles A and B depends on what the common component C is (as the independence axiom says it shouldn’t). Before the uncertainty resolves, you evaluate the compound lottery holistically and prefer the plan involving B (because, in combination with the C branch, B produces a better overall distribution). But then the coin comes up heads, the C branch is now off the table, and you find yourself choosing between A and B in isolation. Consequentialism says you should evaluate based on what’s still possible. And in isolation, you prefer A. So you switch from your plan (B) to your current preference (A). You are dynamically inconsistent.A clever adversary who knows your preferences can now exploit this. They offer you a sequence of trades: pay a small amount to switch from plan-A to plan-B before the coin flip (because ex ante you prefer B in context), then after the coin lands heads, pay a small amount to switch from B to A (because ex post you prefer A in isolation). You have paid twice and ended up exactly where you started. Sufficiency, not necessityThe argument above is valid. But notice its logical structure very carefully. Hammond proved:Dynamic consistency + Consequentialism → IndependenceThis means independence is entailed by the conjunction of dynamic consistency and consequentialism. It does not mean independence is the only way to avoid money pumps. Dynamic consistency alone is what prevents exploitation (if you always follow through on your plans, no one can pump you by getting you to switch mid-stream). And the Hammond result shows that dynamic consistency together with consequentialism implies independence, but this leaves open a crucial possibility: what if you maintain dynamic consistency while giving up consequentialism?In that case, you can violate independence and still be immune to money pumps. The money pump relies on a specific sequence of events: first, you form a plan; then, partway through, you deviate from it because your local evaluation at the intermediate node (which, under consequentialism, ignores the branches that didn’t happen) differs from your global evaluation when you made the plan. If you simply don’t deviate, if you stick to your plan regardless of what your local preferences at intermediate nodes might suggest, the pump has no lever to pull. The adversary offers you a trade mid-stream, you say “no, I committed to a plan and I’m executing it,” and the pump breaks down.It is a well-developed position in decision theory, and it comes in (at least) two flavors.Resolute choiceEdward McClennen developed the theory of resolute choice in his 1990 book Rationality and Dynamic Choice. The idea is straightforward: an agent evaluates the entire decision tree before any uncertainty is resolved, selects the plan that is globally optimal over the full trajectory, commits to it, and then executes it step by step without re-evaluating at intermediate nodes.The cost is giving up consequentialism. At some intermediate node, the resolute chooser may be executing an action that looks suboptimal if you consider only what’s still possible from that node forward. They are choosing it because it was part of the globally optimal plan, and the globally optimal plan was evaluated over the entire tree, including branches that, at this point, have already been resolved. Is this “irrational”? I don’t think so. It is the same thing that anyone does when they follow through on a commitment that has become locally costly but was globally optimal at the time it was made.Sophisticated choiceThere is a second alternative, which goes in a different direction.[1] A sophisticated chooser accepts that their preferences at future nodes will differ from their current global evaluation, and instead of committing to override those future preferences, they predict them and plan around them. They do backward induction: starting from the last decision node, they figure out what they would actually choose there (given their local preferences at that node), then step back one node and choose optimally given what they know they will do later, and so on back to the first node.The sophisticated chooser is also immune to money pumps, because they never form a plan that they will later deviate from. Instead, they form a plan that already accounts for their future deviations. The cost is different from resolute choice: instead of sticking to a globally optimal plan despite local temptation, the sophisticated chooser settles for a plan that may be dominated from the ex ante perspective but is at least self-consistent in the sense that they will actually follow through on it.Sophisticated choice is less elegant than resolute choice, and for our purposes less interesting, but it is worth mentioning because it demonstrates the same structural point: money-pump immunity does not require independence. It just requires some form of sequential coherence (either commitment or self-prediction), and independence is only one way to get there, indeed the most restrictive way.Ergodicity economics as a naturally resolute frameworkI closely follow the work of Ole Peters and collaborators and believe it is very cool. There is, unfortunately, a lot of confusion, and it is not framed in terms of decision theory apparatus, but that is what I am going to do precisely now. An agent who maximizes the time-average growth rate of their wealth over their entire trajectory is, I claim, doing resolute choice in the sense McClennen described. They evaluate the whole plan, the entire sequence of bets, the complete wealth process, as a unified object. They ask: “given the dynamics of this stochastic process, what strategy maximizes my long-run growth rate?” And then they execute that strategy.The “utility function” that falls out of this procedure (via the ergodic mapping, which finds the transformation that renders the wealth process ergodic, so that time and ensemble averages coincide) depends on the dynamics of the process. For multiplicative dynamics, you get logarithmic utility (the Kelly criterion). For additive dynamics, you get linear utility. For more exotic dynamics, you get whatever transformation the ergodic mapping produces. This means the effective utility function is context-dependent: it changes when the stochastic environment changes. And context-dependence of the utility function is precisely what the independence axiom forbids, because independence says your preference between sub-gambles should not depend on what else is in the package.So the EE agent violates independence. But are they exploitable? No. And the reason maps exactly onto the resolute choice framework. The EE agent has committed to a trajectory-level optimization: maximize time-average growth. They don’t re-evaluate at intermediate nodes by asking “given that this branch of the uncertainty has been resolved, what do my local preferences say?” They continue executing the trajectory-level strategy because it was derived from a global evaluation of the entire process. The money pump has no leverage because there is no gap between the agent’s ex ante plan and their ex post behavior. They planned to Kelly-bet (or whatever the ergodic mapping prescribes), and they are Kelly-betting, regardless of what the local branch structure looks like at any given moment.This connection between ergodicity economics and resolute choice has not been articulated before. But it is, I think, the cleanest way to see why EE can violate independence without irrationality.Now, you may or may not accept the entire EE program, but, at the very minimum, I think the conclusion that the agent should pay attention to the dynamic of the gamble and that concrete “utility function” should depend on the gamble is undeniably valid. The broader landscapeThe fact that independence is the weak point of the vNM framework is reflected in the structure of the entire field of generalized decision theory, where the majority of alternative frameworks are built specifically by relaxing or replacing the independence axiom:Rank-dependent utility (Quiggin, 1982) replaces independence with “comonotonic independence” (independence holds only for gambles that rank outcomes in the same order). The result is a preference functional that includes a probability weighting function, which distorts the cumulative distribution before integrating against the utility function. Cumulative prospect theory (Tversky and Kahneman, 1992) combines probability weighting with reference-dependence and loss aversion. It was developed to explain empirical patterns of choice under risk, and it violates independence in multiple ways.Quadratic utility (Chew, Epstein, and Segal) allows the preference functional to be a bilinear form in probabilities, meaning it is quadratic rather than linear in the probability measure. This captures something akin to sensitivity to the variance of a gamble, not just its mean.Betweenness preferences (Dekel, 1986; Chew, 1989) weaken independence to the requirement that if you are indifferent between two lotteries, any mixture of them is equally good. This is strictly weaker than full independence and yields preference functionals defined by implicit functional equations rather than explicit integrals.This convergence is not coincidental. When multiple independent research programs, developed by different people with different motivations over several decades, all arrive at the same structural move (relax independence), it suggests that the constraint being relaxed is objectively too strong. Allais and Ellsberg Behavior Is RationalAllais ParadoxThe Allais paradox is the oldest and most famous demonstration that people systematically violate the independence axiom. The setup, in its simplified form, goes like this.In situation one, you choose between gamble A (certainty of one million euros) and gamble B (89% chance of one million, 10% chance of five million, 1% chance of nothing). Most people choose A.In situation two, you choose between gamble C (11% chance of one million, 89% chance of nothing) and gamble D (10% chance of five million, 90% chance of nothing). Most people choose D.But the move from situation one to situation two is exactly a common-consequence substitution: you strip out the same 89% component from both options in each pair. Independence says this shouldn’t change your preference, so if you chose A over B, you should choose C over D. People do the opposite, and this is treated as evidence of irrationality, a “paradox” revealing that human risk cognition is systematically biased.I want to argue that it is not a paradox at all. It is rational behavior that only looks paradoxical if you insist on evaluating each branch of a lottery independently of every other branch, which is exactly what the independence axiom demands and exactly what a holistic reasoner should not do.Consider why people choose A in situation one. The certainty of one million is qualitatively different from a 99% chance of getting at least one million with a 1% chance of getting nothing. That 1% of nothing looms large because of what it means in context: you are giving up a sure million for a gamble that could leave you with nothing. The certain outcome provides a floor, a guaranteed trajectory, and evaluating the gamble requires considering what happens along the entire trajectory, including the branch where you get nothing while knowing you could have had a certain million.Now consider situation two. Both options involve a high probability of getting nothing. There is no certainty to give up, no floor to sacrifice. The context has fundamentally changed: you are already in a world where you will probably get nothing, and the question is just whether to take a slightly higher probability of a moderate payout or a slightly lower probability of a much larger one. In this context, going for the higher expected value is sensible.The shift from A-over-B to D-over-C is a rational response to the fact that the overall risk structure of the gamble has changed. The “common component” (the 89% that was stripped out) was not psychologically or strategically inert: in situation one, it was providing certainty; in situation two, it was providing nothing. Stripping it out changed the context in which the remaining options are evaluated, and a holistic reasoner, one who evaluates their total exposure rather than decomposing gambles into independent branches, should respond to that change.This is precisely the point we made with the example in the introduction to section 3. If the common component C is a large safety net, you can afford to take more risk on the remaining branch. If C is negligible, you should be more conservative. Your preference between A and B should depend on what else is in the package, because you are one agent facing the total distribution, not a collection of independent sub-agents each evaluating one branch in isolation.The important distinction here is between the descriptive claim and the normative claim. The descriptive claim (people violate independence in the Allais pattern) has been known since 1953 and is not controversial. What is usually controversial is the normative status of this behavior. The standard treatment in economics and in much of the rationality community says: people violate the axiom, this is a bias, ideally they should be corrected. The position I am defending is the opposite: people violate the axiom because the axiom is too strong, their behavior reflects a rational holistic evaluation of the gamble’s structure, and the “correction” (forcing independence-compliant preferences) would make them worse decision-makers, not better ones.Ellsberg ParadoxThe Ellsberg paradox involves a related but distinct phenomenon: ambiguity aversion. The classic setup: an urn contains 30 red balls and 60 balls that are either black or yellow in unknown proportion. You can bet on the color of a drawn ball. Most people prefer betting on red (known probability of 1/3) over betting on black (unknown probability, could be anything from 0 to 2/3), even though if you assign your best-estimate probability of 1/3 to black, the expected values are identical. This is typically treated as another “irrational” bias: the probabilities are the same in expectation, so why should ambiguity matter?Ergodicity economics provides a natural and I think quite elegant resolution, and it comes in two layers.The first layer is a direct Jensen’s inequality argument. Under multiplicative dynamics, the time-average growth rate of a repeated gamble is a concave function of the probability. For a simple multiplicative bet with fraction f of wealth wagered, the growth rate is something like g(p) = p·log(1+f) + (1-p)·log(1-f), which is concave in p.Now consider the Ellsberg urn. The number of black balls could be 0, 1, 2, …, 60. If you are maximally uncertain and average uniformly over these possibilities, the expected proportion is 30/60 = 1/2, which matches the known-probability case. An ensemble-average reasoner sees no difference: E[p] = 1/2 in both cases, so the expected value of the gamble is the same.But concavity of g in p means that Jensen’s inequality applies:E[g(p)] < g(E[p])The average time-average growth rate across all possible urn compositions is strictly less than the time-average growth rate you get when the probability is known to be 1/2. Each distinct urn composition (0 black balls, 1 black ball, 2 black balls, and so on) defines a different multiplicative process with a different time-average growth rate. You can compute all 61 of these growth rates and average them, and that average will be strictly lower than the single growth rate corresponding to the known 1/2 probability, because you are averaging a concave function. The gap is mathematically inevitable, and it is completely invisible to ensemble averaging.The second layer is about strategic optimality. Even beyond the Jensen’s inequality point, an agent under multiplicative dynamics has a further reason to prefer known probabilities: strategy calibration. The optimal strategy (the Kelly fraction, or more generally whatever the ergodic mapping prescribes) depends on the probabilities. When probabilities are known, you can tune your bet size precisely and achieve the optimal time-average growth rate. When probabilities are ambiguous, you cannot.The Kelly criterion is uniquely optimal: any deviation from the correct Kelly fraction, whether you bet too aggressively or too conservatively, strictly reduces the time-average growth rate. If the true probability of black is 1/6 and you bet as if it were 1/3, you are over-betting and your growth rate suffers. If the true probability is 1/2 and you bet as if it were 1/3, you are under-betting and your growth rate also suffers, less dramatically but still measurably. Regardless of what the true probability turns out to be, as long as it differs from your point estimate, your trajectory-level performance is strictly worse than what you could have achieved with known probabilities.So the agent who prefers known probabilities is, in effect, saying: “I want to be able to optimize my strategy for the actual stochastic process I am embedded in, and I can only do that if I know the parameters of that process.” How LessWrong Has Engaged with ThisThe LessWrong community has discussed the independence axiom and related questions multiple times over the past fifteen years, and the landscape is instructive. The pieces are mostly there: the right questions have been asked, the right concerns have been raised, and in one remarkable comment, the right conclusion has been stated almost verbatim. But the pieces have never been assembled into a unified argument. Armstrong’s “Expected Utility Without the Independence Axiom” (2009)Stuart Armstrong’s post is, to my knowledge, the earliest serious treatment of dropping independence on LessWrong, and it gets a lot right. Armstrong correctly identifies independence as the most controversial vNM axiom and explores what kind of decision theory remains when you drop it. This was valuable groundwork, and it is to Armstrong’s credit that he took the question seriously at a time when the LessWrong consensus was (and to a significant extent still is) that violating any vNM axiom is ipso facto irrational.However, Armstrong reaches one conclusion that I think is wrong. His central result is that when an agent faces many lotteries, and those lotteries are independent and have bounded variance, the agent’s aggregate behavior converges to expected utility maximization even without the independence axiom. He writes: “Hence the more lotteries we consider, the more we should treat them as if only their mean mattered. So if we are not risk loving, and expect to meet many lotteries with bounded SD in our lives, we should follow expected utility.”This is a correct result within its assumptions, but the assumptions exclude exactly the cases where abandoning independence matters most. Armstrong’s convergence argument relies on two things: that the lotteries are independent of each other, and that they aggregate additively (so that the law of large numbers, in its standard additive form, applies to their sum). Under these conditions, yes, the variance of the aggregate shrinks relative to the mean, and the mean dominates, which is equivalent to expected utility maximization.But for an agent making sequential decisions where wealth compounds multiplicatively, the aggregation is not additive. The relevant law of large numbers for multiplicative processes concerns the geometric mean, not the arithmetic mean. And the geometric mean of a set of multiplicative gambles is determined by the time-average growth rate (the expected logarithm of the growth factor), not by the expected value. The convergence is to the time average, not the ensemble average. The same line of reasoning can be applied to any non-additive (so not only multiplicative) gamble.Scott Garrabrant’s comment (2022) — Updatelessness and independenceIn December 2022, Scott Garrabrant left a comment beneath a post on the EUT that I consider one of the most important things written on LessWrong in the context of this question. I want to quote the core of it and then explain why it matters for my argument.Garrabrant wrote:My take is that the concept of expected utility maximization is a mistake. […] As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently. […] Von Neumann did not notice this mistake because he was too busy inventing the entire field. The point where we discover updatelessness is the point where we are supposed to realize that all of utility theory is wrong. I think we failed to notice.The argument, unpacked, goes like this. The vNM framework, and every axiomatization of utility that Garrabrant is aware of, implicitly assumes updating: when you observe something (say, a coin comes up heads), you condition on that observation and from that point forward you only care about worlds consistent with it. The worlds where the coin came up tails are discarded from your deliberation. This is Bayesian updating applied to preferences, not just beliefs, and it is so deeply embedded in the framework that it is usually invisible.But the LessWrong/MIRI decision theory research program discovered, through work on Updateless Decision Theory and its successors, that updating is not a requirement of rationality. An updateless agent does not narrow its caring when it makes an observation. Now here is the connection which is the reason I am presenting Garrabrant’s comment at length.The updating step that Garrabrant identifies as the hidden assumption in utility theory is, formally, the same thing as the branch-by-branch evaluation that the independence axiom encodes. When you update on “the coin came up heads,” you evaluate your remaining options conditional on this observation, ignoring the tails branch. Independence says this conditional evaluation should be the same regardless of what was on the tails branch, precisely because you are supposed to discard the tails branch after updating. An updateless agent, by contrast, evaluates the entire policy (covering both heads and tails) as a single object, and the value of the heads-branch action depends on what the tails-branch action is, because both are part of the same globally optimized policy.This is structurally parallel to the EE critique: the time-average reasoner evaluates the entire trajectory (all branches, the full compounding structure) as a unified object, rather than decomposing it into independent branches and evaluating each one after updating on which branch was realized. The EE agent is, in Garrabrant’s terminology, updateless with respect to the temporal unfolding of their wealth process.Two completely independent lines of thought, one coming from physics and the mathematics of stochastic processes, the other coming from the philosophical and logical analysis of decision theory within the rationalist community, converge on the same structural conclusion: the independence axiom encodes a branch-by-branch, post-update evaluation that is not required by rationality, and the most reflectively coherent agents are those who evaluate holistically rather than branch-by-branch.Academian’s “VNM Expected Utility Theory: Uses, Abuses, and Interpretation” (2010)Academian’s post covers a lot of ground, but the section relevant to our discussion is section 5, titled “The independence axiom isn’t so bad.”Academian’s defense of independence rests on what he calls the Contextual Strength (CS) interpretation of vNM utility. The idea is that vNM-preference should be understood as “strong preference” within a given context of outcomes. When the vNM formalism says you are indifferent between two options (S = D in the parent-giving-a-car-to-children example), this does not mean you have no preference at all. It means you have no preference strong enough that you would sacrifice probabilistic weight on outcomes that matter in the current context in order to indulge it. Under this interpretation, the independence axiom’s requirement that S = D implies S = F = D (where F is the coin-flip mixture) just means you wouldn’t sacrifice anything contextually important to get the fair coin flip over either deterministic option. You can still prefer the coin flip in some weaker sense; you just can’t prefer it strongly enough to trade off against the things that actually matter.I want to acknowledge that this is a well-crafted defense, and Academian is admirably honest about most of its limitations. But the CS defense has a critical limitation that Academian does not address: it works only for small, contextually negligible independence violations. The parent-and-car example involves a marginal preference for fairness that is, as Academian argues, plausibly too weak to warrant probabilistic sacrifice in a context that includes weighty outcomes. Fine. But the independence violations that arise in the settings this article is concerned with are not marginal at all.Consider again the gamble example from section 3. You are choosing between gambles A and B, and the common component C is either a large safety net (ten million euros) or a trivial amount (five euros). Your preference between A and B flips depending on what C is: with the large safety net, you take the risky option; without it, you take the safe one. This is not a whisper of a preference that disappears when larger considerations are in play, but a robust, large-magnitude shift in risk strategy driven by the structural properties of your total exposure. The CS interpretation cannot accommodate this, because the whole point of CS is that independence violations are contextually negligible, and in the cases that matter for EE and for real-world sequential decision-making, they are anything but.Fallenstein’s “Why You Must Maximize Expected Utility” (2012)Benja Fallenstein’s post is the most rigorous and carefully argued defense of expected utility maximization on LessWrong, and it is the one that most directly claims what the title says: that you must maximize expected utility. If the argument of this article is correct, Fallenstein’s post is where the disagreement is sharpest.Fallenstein’s setup is this. You have a “genie,” a perfect Bayesian AI, that must choose among possible actions on your behalf. The genie comprehends the set of all possible “giant lookup tables” (complete plans specifying what to do in every conceivable situation) and selects the one that best satisfies your preferences. Preferences are defined over “outcomes,” which are data structures containing all and only the information about the world that matters to your terminal values. The genie evaluates probability distributions over these outcomes.Within this setup, Fallenstein argues for independence by analogy with conservation of expected evidence. He writes: “The Axiom of Independence is equivalent to saying that if you’re evaluating a possible course of action, and one experimental result would make it seem more attractive than it currently seems to you, while the other experimental result would at least make it seem no less attractive, then you should already be finding it more attractive than you do.” He then addresses the parent/car/coin counterexample by arguing that if you care about the randomization mechanism, this should already be encoded in the outcome, not in the preference over lotteries.This is a strong argument, and it is correct within its setup. If you accept the timeless-genie framing, where a perfect Bayesian evaluates all possible world-histories simultaneously and chooses among complete plans from a god’s-eye view, then independence is very nearly trivially true. The genie faces a single, static decision over probability distributions. There is no temporal sequence, no compounding, no intermediate node at which the genie might re-evaluate. The genie simply picks the best plan, and the best plan is the one whose probability distribution over outcomes ranks highest. In this setting, asking whether the “common component” should influence the evaluation is like asking whether an irrelevant column in a spreadsheet should affect which row you pick: obviously not, because you’re evaluating the whole row at once.But the force of this argument depends entirely on whether you accept the timeless-genie framing as the correct idealization of rational decision-making. And this is precisely what ergodicity economics and the updatelessness research program both call into question.The genie exists outside of time. It surveys the entire space of possible histories from above, assigns probabilities, and computes weighted sums. This is the ensemble-averaging perspective, formalized as a decision procedure. And it is a perfectly coherent idealization, one of the possible “geometries” of decision theory. But it is not the only one. An agent who is embedded in a temporal process, who faces sequential decisions with compounding consequences, who cannot step outside of time and evaluate all histories simultaneously, lives in a different geometry. For this agent, the temporal structure of the process, the order in which decisions are made, the way outcomes compound, the path-dependence of wealth dynamics are the central features of the decision problem.Fallenstein’s argument shows that if you accept the timeless-genie setup, you get expected utility maximization, not that you must accept the timeless-genie setup. The question that EE raises, whether a temporally embedded agent facing sequential compounding decisions should evaluate trajectories holistically rather than decomposing them into independent branches, falls entirely outside Fallenstein’s framing. It is not addressed because it cannot be addressed within that framing, just as questions about the curvature of space cannot be addressed within Euclidean geometry. You need a different geometry to even ask them.Just Give Up on EUTI think we just need to abandon EUT, once and for all. It is bad to describe humans, bad to describe AIs, and bad to describe potential superintelligences.The argument for this conclusion has three legs, and I want to make sure all three are visible:1. Theoretical. The independence axiom is sufficient but not necessary for avoiding Dutch book exploitation. 2. Empirical. The Allais paradox, the Ellsberg paradox, and the general instability of estimated risk-aversion parameters across contexts are not bugs in human cognition that education or debiasing should correct, buf features which are exactly what you would expect from agents who evaluate their total risk exposure holistically rather than branch by branch. 3. Convergence of independent research programs. The ergodicity economics and the updateless decision theory program, independently and from completely different starting points, converge on the same structural insight: the branch-by-branch, post-update evaluation that the independence axiom encodes is just one possible rational way to face uncertainty, and there are others.The rationalist community has an enormous intellectual investment in expected utility maximization. It is woven into the foundations of how this community thinks about decision theory, about AI alignment, about what it means for an agent to be rational. Eliezer’s sequences treat EU maximization as nearly axiomatic. The VNM theorem is invoked routinely as a constraint on what rational agents can look like. A great deal of alignment-relevant reasoning (about corrigibility, about value learning, about what kinds of objective functions a superintelligent agent would have) implicitly assumes that sufficiently rational agents are EU maximizers. Hence, there is even more grace in pivoting from EUT and acknowledging the problems with the independence axiom. János Bolyai wrote to his father: “Out of nothing I have created a strange new universe.” We shouldn’t be afraid to the same in decision theory. ^The terminology “sophisticated choice” itself was consolidated by Hammond (1976) and especially by McClennen (1990), who contrasted it explicitly with resolute choice. So McClennen’s book is actually the key source that sets up the three-way taxonomy: naive, sophisticated, and resolute.Discuss Read More