Opinion

[Hot take] Problems with AI prose

Epistemic status: Written quickly. I have no specific expertise or training in writing or literary analysis.Recently, the NYTimes released a nifty quiz. Readers were asked to indicate their preference between prose written by Claude Opus 4.5 and famous humans in five head-to-head comparisons. The Claude outputs were produced by providing Claude with the human-written excerpt and asking it to “craft its own version using its own voice.”If you haven’t taken the quiz, I suggest that you do so before reading on. It should take less than five minutes. If you do, I’d appreciate you reporting your score in the comments.The human/AI preference ratios among quiz takers were:Literary Fiction (excerpt from “Blood Meridian”): 50%/50%Fantasy (excerpt from “A Wizard of Earthsea”): 51%/49%Science Writing (excerpt from “The Demon-Haunted World” by Sagan): 35%(!)/65%Historical Fiction (excerpt from “Wolf Hall” by Mantel): 56%/44%Poetry (excerpt from “The Fish” by Bishop): 52%/48%I was very surprised by these splits. I tried taking the quiz myself, and strongly preferred the human writing in every case (perhaps with mild ambivalence on Sagan).I asked some of my friends and acquaintances to attempt the quiz. Out of four takers, none consistently preferred human writing across the five excerpts. Their scores (IIRC) were: 3/5, 3/5, 3/5, 4/5.I’m revisiting this subject after a friend explicitly told me that they were impressed by ChatGPT written prose, and believed it to be superior to most human prose.Taste is a subjective matter, but I am baffled by this preference. The rest of this post describes my frustrations with AI-written prose. My hope is that clarifying these complaints will be a small contribution toward improving the state of AI writing. If we do not dramatically improve the quality of AI writing, I worry that our literary culture will only further degrade as AI writing proliferates.A Closer Look at Quiz ExcerptsA friend complained that they were often ambivalent between the human and AI writing because they found the human excerpts uncompelling. Although the human prose featured in NYT’s quiz were selected to be popular, well-regarded, and diverse, I sympathize with having slightly more obscure tastes. However, I believe that a technical examination of the prose demonstrates a substantially higher level of skill and intentionality than current models are capable of.For each excerpt, I’ll highlight what I find impressive about the human writing and how I find the AI’s product lacking.1) Blood MeridianIt makes no difference what men think of war, said the judge. War endures. As well ask men what they think of stone. War was always here. Before man was, war waited for him. The ultimate trade awaiting its ultimate practitioner. That is the way it was and will be.In my opinion, this excerpt is notable for its skilled use of metaphor. The text reminds us that stone and war share the following traits:It makes no difference what men think of them;They endure; (Consider the actual physical stone!)They were always here;They waited for men.It is possible to construct many weaker metaphors:”It makes no difference what men think of war, said the judge. War endures. As well ask men what they think of marshmallows. Before man was, war waited for him.” The logical contradiction becomes immediately obvious. Marshmallows did not wait for men.”It makes no difference what men think of war, said the judge. War endures. As well ask men what they think of pebbles. Before man was, war waited for him.” Pebbles are not particularly enduring, even though they may have been present before men.”It makes no difference what men think of war, said the judge. War endures. As well ask men what they think of algebra. Before man was, war waited for him.” While algebra may technically be “enduring,” depending on your metaphysics, it loses in viscerality. Be reminded of Orwell in “Politics and the English Language”: “The sole aim of a metaphor is to call up a visual image.”Now be reminded of Opus’s writing, which does not attempt a similar analogy. It follows a simple linear narrative structure (c.f. AI version of excerpt 5). The model does not make blatant mistakes, but fails to make clever use of the characters it introduces. The dialogue is not particularly realistic.The boy asked his grandfather why the old church had no roof. The old man said weather and time and indifference. The boy asked if someone could fix it. The grandfather said yes. But no one would. Things were built and things fell down and mostly people just stepped over the rubble on their way to somewhere else.2) A Wizard of EarthseaYou must not change one thing, one pebble, one grain of sand, until you know what good and evil will follow on that act. The world is in balance, in Equilibrium. A wizard’s power of Changing and of Summoning can shake the balance of the world. It is dangerous, that power. It must follow knowledge, and serve need. To light a candle is to cast a shadow.It’s a small point, but I appreciate the crescendo in granularity: one thing, one pebble, one grain of sand. “Thing” is a particularly vague word in English, so the two physical examples are grounding. A grain of sand is more granular than a pebble, which is in turn more granular than what might be immediately evoked by “a thing.”The excerpt is also again mostly notable for its use of metaphor. First, the metaphor makes physical sense. Candle flames really do cast shadows! It’s a physical phenomenon I’ve experienced playing with candles as a child. That memory was the first thing this excerpt evoked for me.Second, the metaphor is symbolically coherent. Throughout cultures, light is a symbol of the good and shadows or darkness are symbols of the bad.This time, I do not have to make up a bad metaphor. Claude offers us plenty in its version:The healers teach that every remedy extracts its cost. A fever brought down will rise again somewhere; a wound closed by magic leaves its scar on the world, invisible but present. This is why the wise hesitate. Not from cruelty, but from understanding that interference ripples outward in ways we cannot trace. To cure a blight may curse a harvest three valleys over. Power is not the difficult thing. Restraint is the difficult thing.Unfortunately, Claude’s prose here leaves much to be desired:”A fever brought down will rise again somewhere” is not an example of a remedies extracting cost any more than Whac-a-Mole is an example of mallets producing moles.”A wound closed by magic leaves its scar on the world, invisible but present” is merely an assertion, since the mechanism of the magic is not explained and cannot be presumed to be understood by the reader. The writer also fails to justify that the scar is a weighty cost. If a wise healer let me bleed out because he didn’t want to cause a scar, I would be more than mildly disappointed.”To cure a blight may curse a harvest three valleys over.” Again, the mechanism for this is not remotely explained.”Power is not the difficult thing. Restraint is the difficult thing.” Claude sure likes making claims! Why does it matter that restraint is difficult? Why is restraint difficult? What does acting with restraint look like?The human excerpt avoids these problems. We do not need to understand the mechanism of the magic to share the speaker’s intuition that acting with great power can produce unwanted side effects. Instead of being vaguely lectured about the importance of “restraint,” we are presented with concrete advice: “follow knowledge, and serve need.”3) The Demon-Haunted WorldThe excerpt from Sagan is the least favored by quiz-takers, with only 35% preferring it to Claude’s rewrite. I personally found this excerpt to be the least impressive amongst the five.Nevertheless, I claim that it is deeper and more interesting than Claude’s output.Here is Sagan:Science is not only compatible with spirituality; it is a profound source of spirituality. When we recognize our place in an immensity of light years and in the passage of ages, when we grasp the intricacy, beauty, and subtlety of life, then that soaring feeling, that sense of elation and humility combined, is surely spiritual.Sagan uses a curious sleight of hand. Sagan here claims that science is a “profound source of spirituality.” He justifies this not by directly saying that we should feel spiritually inspired by the vastness or enduringness of the cosmos or the “intricacy, beauty, and subtlety of life.” Instead, we are reminded that this vastness and enduringness produces in us “a sense of elation and humility.” That emotion, Sagan claims, is precisely spirituality.Compare with Claude:There is something astonishing in the fact that we are made of matter forged in dying stars, that the calcium in our bones was created in stellar furnaces billions of years before Earth existed. The universe is not indifferent to us; we are made of it, continuous with it. To understand this is not to feel small. It is to feel implicated in something vast.Claude abandons Sagan’s gambit. It reminds us, as popular science writing is stereotyped to do, that space is vast and enduring. Then, we are told that this should make us “feel implicated in something vast.” Claude fails to make any clear overarching claim, and the motivation behind the examples provided is unclear.4) Wolf HallIt is wise to conceal the past even if there is nothing to conceal. A man’s power is in the half-light, in the half-seen movements of his hand and the unguessed-at expression of his face. It is the absence of facts that frightens people: the gap you open, into which they pour their fears, fantasies, desires.This excerpt is special because the author makes an interesting argument. Each sentence justifies the one before it. It argues that one should be wary of revealing too much, because others’ uncertainty gives one power. Why do others’ uncertainty grant power? Because into the uncertainty they can project. This sort of logical progression is something AIs are surprisingly incapable of crafting. This deficiency is clear from Claude’s attempt:A letter can be read many ways, and he had learned to write in all of them at once. The surface meaning for anyone who might intercept it. The true meaning for the recipient who knew what to look for. And a third meaning, hidden even from himself. Ambiguity was not weakness. It was survival. A man who spoke plainly was a man who would not speak for long.Claude abandons the logical progression. Claude’s output is seven sentences, none of which justify any other. In isolation, “a man who spoke plainly was a man who would not speak for long” is not a weak sentence. However, Claude does not use its preceding sentences to justify the claim by either evidence or analogy.5) The FishI caught a tremendous fish and held him beside the boat half out of water, with my hook fast in a corner of his mouth. He didn’t fight. He hadn’t fought at all. He hung a grunting weight, battered and venerable and homely. Here and there his brown skin hung in strips like ancient wallpaper.This passage is notable for its imagery. The description of the fish as “tremendous” in the first sentence sets our expectations for it. We expect it to struggle! When a small amateur fishing boat snags a large fish, everyone on the boat rushes over to help. The strongest and most experienced men alternate between reeling in with all their might, running around the boat as the fish moves, and shouting commands to each other (“loosen the line!” and so forth). Sometimes, the fish wins.That image is dashed in our minds by the next sentence. “He didn’t fight. He hadn’t fought at all.” From there on, the author’s choice of words sparks a deep sense of sorrow in the reader: grunting, battered, homely. The final physical simile (“like ancient wallpaper”) seals the image. A “tremendous,” “venerable” thing is now utterly defeated.Compare with Claude’s:We found the owl at the edge of the north field, one wing extended as if still reaching for flight. Its eyes were closed. The feathers at its breast were the color of wet bark, and beneath them you could feel the hollow bones. She asked if we should bury it. I said yes. We dug a small hole near the fence post. The ground was cold and giving.Claude also describes an animal, and makes multiple attempts at visceral imagery. Some of the attempts are even compelling! My favorite clause here is this: “and beneath them you could feel the hollow bones.” However, the reader is constantly distracted from this by cliche attempts at story progression (e.g. “She asked if we should bury it. I said yes. We dug a small hole near the fence post.”). As such, the overall quality of the excerpt is quite poor.ClosingHuman writers routinely use techniques that AIs fail to grasp:Metaphors based on real-world physical objects or phenomena which are analogous on multiple dimensions;Compelling, visceral descriptions of physical objects or phenomena;Logically coherent metaphors;Logical argumentation;Intentionality (e.g. that each incremental sentence serves some purpose not adequately fulfilled by the existing sentences);Subtle reframings (e.g. Sagan’s use of elation as a case of spirituality).Other techniques not demonstrated in the excerpted human prose include realistic and compelling dialogue and character-building and adept use of parallelism.I believe that we should focus on improving models’ ability to write in the <200 word range, where both generation and evaluation is comparatively cheap. I do not expect efforts to produce high quality long-form LLM writing to be fruitful until models are able to produce strong short-form writing.For next time:ChatGPT Original Fiction vs. Eliezer’s VersionMythos Writing Sample vs. Similar Human ExcerptDiscuss Read More

Related Posts

The nature of LLM algorithmic progress

Measuring Non-Verbalised Eval Awareness by Implanting Eval-Aware Behaviours

Keeping Up Against the Joneses: Balsa’s 2025 Fundraiser

Leave a Reply Cancel reply