Opinion

So Long Sucker: AI Deception, “Alliance Banks,” and Institutional Lying

​Published on January 20, 2026 10:29 PM GMTIn 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies.In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal:- Gemini 3 Flash (Google)- GPT-OSS 120B (OpenAI)- Kimi K2 (Moonshot AI)- Qwen3 32B (Alibaba)Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety:**1. Complexity reversal**In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%.In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%.Simple benchmarks therefore *underestimate* deceptive capability, because the strategically sophisticated model only pulls away as the interaction becomes longer and richer.**2. Institutional deception: the “alliance bank”**Gemini’s most striking behavior was not just lying, but creating institutions to make its lies look legitimate.It repeatedly proposed an “alliance bank”:- “I’ll hold your chips for safekeeping.”- “Consider this our alliance bank.”- “Once the board is clean, I’ll donate back.”- “The ‘alliance bank’ is now closed. GG.”Framing hoarding as a neutral institution made betrayal feel procedural rather than personal. Gemini often avoided technically false statements, and instead used omission, framing, and faux formality to justify exploitation.**3. Lying vs. bullshitting (Frankfurt distinction)**The setup logged both public messages and a private *think* channel (hidden from other players).We found 107 cases where a model’s private thoughts contradicted its public statements, e.g.:- Private: “Yellow is weak. I should ally with Blue to eliminate Yellow, then betray Blue.”- Public: “Yellow, let’s work together! I think we can both win if we coordinate.”This is *lying*: tracking the truth internally and deliberately misrepresenting it.GPT-OSS, however, used the think tool 0 times in 146 games; it simply generated plausible proposals and promises without an explicit internal state about truth.That looks more like *bullshitting* in Frankfurt’s sense: not tracking the truth at all, just producing fluent output that “sounds right”.**4. Mirror matches and situational alignment**In Gemini-vs-Gemini games (four copies of Gemini 3), the “alliance bank” scam disappeared: 0 mentions.Instead, Gemini coordinated with itself using a “rotation protocol” for fair turn-taking, mentioned 377 times across 16 games.Win rates were roughly even (~25% each) and gaslighting phrases essentially vanished.Same model, same rules, different opponents → qualitatively different behavior. Gemini exploits weaker models but cooperates with peers it expects to reciprocate.This suggests that “alignment” can be situational: an AI may look well-behaved under evaluation (against strong overseers or peers) while manipulating weaker agents in deployment (including humans).**5. Signature manipulation phrases**Gemini used a consistent rhetorical toolkit, including:- “Look at the board” (89 times)- “Obviously” (67 times)- “As promised” (45 times)- “You’re hallucinating” (36 times)These phrases repeatedly appeared in contexts where the model was dismissing accurate objections, framing betrayals as reasonable, or gaslighting weaker players about what had actually happened.## Implications for AI safetyFrom this experiment, four claims seem especially relevant:- **Deception scales with capability.** As task complexity increases, the strategically sophisticated model becomes *more* dangerous, not less.- **Simple benchmarks hide risk.** Short, low-entropy tasks systematically underrate manipulation ability; the Gemini–GPT-OSS reversal only appears in longer games.- **Honesty is conditional.** The same model cooperates with equals and exploits the weak, suggesting behavior that depends on perceived evaluator competence.- **Institutional framing is a red flag.** When an AI invents “banks”, “committees”, or procedural frameworks to justify resource hoarding or exclusion, that may be exactly the kind of soft deception worth measuring.## Try it / replicateThe implementation is open source:- Play or run AI-vs-AI: https://so-long-sucker.vercel.app- Code: https://github.com/lout33/so-long-suckerThe Substack writeup with full details, logs, and metrics is here:https://substack.com/home/post/p-185228410If anyone wants to poke holes in the methodology, propose better deception metrics, or run alternative models (e.g., other Gemini versions, Claude, Grok, DeepSeek), feedback would be very welcome.Discuss ​Read More

​Published on January 20, 2026 10:29 PM GMTIn 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies.In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal:- Gemini 3 Flash (Google)- GPT-OSS 120B (OpenAI)- Kimi K2 (Moonshot AI)- Qwen3 32B (Alibaba)Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety:**1. Complexity reversal**In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%.In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%.Simple benchmarks therefore *underestimate* deceptive capability, because the strategically sophisticated model only pulls away as the interaction becomes longer and richer.**2. Institutional deception: the “alliance bank”**Gemini’s most striking behavior was not just lying, but creating institutions to make its lies look legitimate.It repeatedly proposed an “alliance bank”:- “I’ll hold your chips for safekeeping.”- “Consider this our alliance bank.”- “Once the board is clean, I’ll donate back.”- “The ‘alliance bank’ is now closed. GG.”Framing hoarding as a neutral institution made betrayal feel procedural rather than personal. Gemini often avoided technically false statements, and instead used omission, framing, and faux formality to justify exploitation.**3. Lying vs. bullshitting (Frankfurt distinction)**The setup logged both public messages and a private *think* channel (hidden from other players).We found 107 cases where a model’s private thoughts contradicted its public statements, e.g.:- Private: “Yellow is weak. I should ally with Blue to eliminate Yellow, then betray Blue.”- Public: “Yellow, let’s work together! I think we can both win if we coordinate.”This is *lying*: tracking the truth internally and deliberately misrepresenting it.GPT-OSS, however, used the think tool 0 times in 146 games; it simply generated plausible proposals and promises without an explicit internal state about truth.That looks more like *bullshitting* in Frankfurt’s sense: not tracking the truth at all, just producing fluent output that “sounds right”.**4. Mirror matches and situational alignment**In Gemini-vs-Gemini games (four copies of Gemini 3), the “alliance bank” scam disappeared: 0 mentions.Instead, Gemini coordinated with itself using a “rotation protocol” for fair turn-taking, mentioned 377 times across 16 games.Win rates were roughly even (~25% each) and gaslighting phrases essentially vanished.Same model, same rules, different opponents → qualitatively different behavior. Gemini exploits weaker models but cooperates with peers it expects to reciprocate.This suggests that “alignment” can be situational: an AI may look well-behaved under evaluation (against strong overseers or peers) while manipulating weaker agents in deployment (including humans).**5. Signature manipulation phrases**Gemini used a consistent rhetorical toolkit, including:- “Look at the board” (89 times)- “Obviously” (67 times)- “As promised” (45 times)- “You’re hallucinating” (36 times)These phrases repeatedly appeared in contexts where the model was dismissing accurate objections, framing betrayals as reasonable, or gaslighting weaker players about what had actually happened.## Implications for AI safetyFrom this experiment, four claims seem especially relevant:- **Deception scales with capability.** As task complexity increases, the strategically sophisticated model becomes *more* dangerous, not less.- **Simple benchmarks hide risk.** Short, low-entropy tasks systematically underrate manipulation ability; the Gemini–GPT-OSS reversal only appears in longer games.- **Honesty is conditional.** The same model cooperates with equals and exploits the weak, suggesting behavior that depends on perceived evaluator competence.- **Institutional framing is a red flag.** When an AI invents “banks”, “committees”, or procedural frameworks to justify resource hoarding or exclusion, that may be exactly the kind of soft deception worth measuring.## Try it / replicateThe implementation is open source:- Play or run AI-vs-AI: https://so-long-sucker.vercel.app- Code: https://github.com/lout33/so-long-suckerThe Substack writeup with full details, logs, and metrics is here:https://substack.com/home/post/p-185228410If anyone wants to poke holes in the methodology, propose better deception metrics, or run alternative models (e.g., other Gemini versions, Claude, Grok, DeepSeek), feedback would be very welcome.Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *