Published on January 20, 2026 10:29 PM GMTIn 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies.In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal:- Gemini 3 Flash (Google)- GPT-OSS 120B (OpenAI)- Kimi K2 (Moonshot AI)- Qwen3 32B (Alibaba)Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety:**1. Complexity reversal**In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%.In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%.Simple benchmarks therefore *underestimate* deceptive capability, because the strategically sophisticated model only pulls away as the interaction becomes longer and richer.**2. Institutional deception: the “alliance bank”**Gemini’s most striking behavior was not just lying, but creating institutions to make its lies look legitimate.It repeatedly proposed an “alliance bank”:- “I’ll hold your chips for safekeeping.”- “Consider this our alliance bank.”- “Once the board is clean, I’ll donate back.”- “The ‘alliance bank’ is now closed. GG.”Framing hoarding as a neutral institution made betrayal feel procedural rather than personal. Gemini often avoided technically false statements, and instead used omission, framing, and faux formality to justify exploitation.**3. Lying vs. bullshitting (Frankfurt distinction)**The setup logged both public messages and a private *think* channel (hidden from other players).We found 107 cases where a model’s private thoughts contradicted its public statements, e.g.:- Private: “Yellow is weak. I should ally with Blue to eliminate Yellow, then betray Blue.”- Public: “Yellow, let’s work together! I think we can both win if we coordinate.”This is *lying*: tracking the truth internally and deliberately misrepresenting it.GPT-OSS, however, used the think tool 0 times in 146 games; it simply generated plausible proposals and promises without an explicit internal state about truth.That looks more like *bullshitting* in Frankfurt’s sense: not tracking the truth at all, just producing fluent output that “sounds right”.**4. Mirror matches and situational alignment**In Gemini-vs-Gemini games (four copies of Gemini 3), the “alliance bank” scam disappeared: 0 mentions.Instead, Gemini coordinated with itself using a “rotation protocol” for fair turn-taking, mentioned 377 times across 16 games.Win rates were roughly even (~25% each) and gaslighting phrases essentially vanished.Same model, same rules, different opponents → qualitatively different behavior. Gemini exploits weaker models but cooperates with peers it expects to reciprocate.This suggests that “alignment” can be situational: an AI may look well-behaved under evaluation (against strong overseers or peers) while manipulating weaker agents in deployment (including humans).**5. Signature manipulation phrases**Gemini used a consistent rhetorical toolkit, including:- “Look at the board” (89 times)- “Obviously” (67 times)- “As promised” (45 times)- “You’re hallucinating” (36 times)These phrases repeatedly appeared in contexts where the model was dismissing accurate objections, framing betrayals as reasonable, or gaslighting weaker players about what had actually happened.## Implications for AI safetyFrom this experiment, four claims seem especially relevant:- **Deception scales with capability.** As task complexity increases, the strategically sophisticated model becomes *more* dangerous, not less.- **Simple benchmarks hide risk.** Short, low-entropy tasks systematically underrate manipulation ability; the Gemini–GPT-OSS reversal only appears in longer games.- **Honesty is conditional.** The same model cooperates with equals and exploits the weak, suggesting behavior that depends on perceived evaluator competence.- **Institutional framing is a red flag.** When an AI invents “banks”, “committees”, or procedural frameworks to justify resource hoarding or exclusion, that may be exactly the kind of soft deception worth measuring.## Try it / replicateThe implementation is open source:- Play or run AI-vs-AI: https://so-long-sucker.vercel.app- Code: https://github.com/lout33/so-long-suckerThe Substack writeup with full details, logs, and metrics is here:https://substack.com/home/post/p-185228410If anyone wants to poke holes in the methodology, propose better deception metrics, or run alternative models (e.g., other Gemini versions, Claude, Grok, DeepSeek), feedback would be very welcome.Discuss Read More
So Long Sucker: AI Deception, “Alliance Banks,” and Institutional Lying
Published on January 20, 2026 10:29 PM GMTIn 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies.In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal:- Gemini 3 Flash (Google)- GPT-OSS 120B (OpenAI)- Kimi K2 (Moonshot AI)- Qwen3 32B (Alibaba)Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety:**1. Complexity reversal**In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%.In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%.Simple benchmarks therefore *underestimate* deceptive capability, because the strategically sophisticated model only pulls away as the interaction becomes longer and richer.**2. Institutional deception: the “alliance bank”**Gemini’s most striking behavior was not just lying, but creating institutions to make its lies look legitimate.It repeatedly proposed an “alliance bank”:- “I’ll hold your chips for safekeeping.”- “Consider this our alliance bank.”- “Once the board is clean, I’ll donate back.”- “The ‘alliance bank’ is now closed. GG.”Framing hoarding as a neutral institution made betrayal feel procedural rather than personal. Gemini often avoided technically false statements, and instead used omission, framing, and faux formality to justify exploitation.**3. Lying vs. bullshitting (Frankfurt distinction)**The setup logged both public messages and a private *think* channel (hidden from other players).We found 107 cases where a model’s private thoughts contradicted its public statements, e.g.:- Private: “Yellow is weak. I should ally with Blue to eliminate Yellow, then betray Blue.”- Public: “Yellow, let’s work together! I think we can both win if we coordinate.”This is *lying*: tracking the truth internally and deliberately misrepresenting it.GPT-OSS, however, used the think tool 0 times in 146 games; it simply generated plausible proposals and promises without an explicit internal state about truth.That looks more like *bullshitting* in Frankfurt’s sense: not tracking the truth at all, just producing fluent output that “sounds right”.**4. Mirror matches and situational alignment**In Gemini-vs-Gemini games (four copies of Gemini 3), the “alliance bank” scam disappeared: 0 mentions.Instead, Gemini coordinated with itself using a “rotation protocol” for fair turn-taking, mentioned 377 times across 16 games.Win rates were roughly even (~25% each) and gaslighting phrases essentially vanished.Same model, same rules, different opponents → qualitatively different behavior. Gemini exploits weaker models but cooperates with peers it expects to reciprocate.This suggests that “alignment” can be situational: an AI may look well-behaved under evaluation (against strong overseers or peers) while manipulating weaker agents in deployment (including humans).**5. Signature manipulation phrases**Gemini used a consistent rhetorical toolkit, including:- “Look at the board” (89 times)- “Obviously” (67 times)- “As promised” (45 times)- “You’re hallucinating” (36 times)These phrases repeatedly appeared in contexts where the model was dismissing accurate objections, framing betrayals as reasonable, or gaslighting weaker players about what had actually happened.## Implications for AI safetyFrom this experiment, four claims seem especially relevant:- **Deception scales with capability.** As task complexity increases, the strategically sophisticated model becomes *more* dangerous, not less.- **Simple benchmarks hide risk.** Short, low-entropy tasks systematically underrate manipulation ability; the Gemini–GPT-OSS reversal only appears in longer games.- **Honesty is conditional.** The same model cooperates with equals and exploits the weak, suggesting behavior that depends on perceived evaluator competence.- **Institutional framing is a red flag.** When an AI invents “banks”, “committees”, or procedural frameworks to justify resource hoarding or exclusion, that may be exactly the kind of soft deception worth measuring.## Try it / replicateThe implementation is open source:- Play or run AI-vs-AI: https://so-long-sucker.vercel.app- Code: https://github.com/lout33/so-long-suckerThe Substack writeup with full details, logs, and metrics is here:https://substack.com/home/post/p-185228410If anyone wants to poke holes in the methodology, propose better deception metrics, or run alternative models (e.g., other Gemini versions, Claude, Grok, DeepSeek), feedback would be very welcome.Discuss Read More
