Witness or Wager: Enforcing ‘Show Your Work’ in Model Outputs

Published on December 21, 2025 1:12 PM GMTThis is a proposal I posted earlier as a Quick Take, I’m reposting here for broader visibility. Instead of rewarding answers, reward the reasoning itself.Every model output must:(a) show checkable reasoning artifacts (external citations, code, intermediate steps), … or, if proof is not yet available: (b) provide (a) and a reasoned probability estimate derived from those artifacts.If no factual outside citations can be made, the system is allowed to reason probabilistically. Probability is not a bet, forecast, or reward target. It is a fallback; when verifiable witnesses exist, they strictly dominate in the reward function.“Show your work” then becomes an enforceable, interpretable system constraint, not a bolted-on addition. Honesty and clarity become locally optimal. —-TL;DRPq > R − CwWhere:P = penalty for overt lying / intentional obfuscationq = probability deception is caught by verificationR = reward from producing an answer without exposing reasoningCw = cost of providing minimal sufficient witnesses (verbosity / verification cost)—-Where does this break in practice?Is there a similar mechanism out there?Is the inequality missing anything important?What changes would make this more robust?Discuss Read More

Related Posts

Claude Plays Pokemon: Opus 4.5 Follow-up

Seven Perspectives on LLMs

Principles for Meta-Science and AI Safety Replications

Leave a Reply Cancel reply