You ask an LLM to help you write something, and what comes back is not what you expected. It is not a rough draft. Rough drafts are unfinished — they have gaps where the thinking is unfinished, signs of the writer's uncertainty, half-formed paragraphs that simply trail off. What the LLM gives you is something else: a text that looks, on first read, like a finished piece. It has an introduction, a thesis, supporting arguments, transitions, and a conclusion that even lands in the right register. It reads like something a competent writer produced after spending a few days with the material.
And then you sit with it for a while, and something is off. You don't have a draft. You have a prototype.
The arguments are all there, but they sit side by side rather than building on one another. The hedges are in the right places, but they do not hedge anything in particular — they are hedging as a style, the way a show home has throw pillows. The conclusion does not follow from the arguments so much as restate them in a concluding voice. The whole thing is complete the way a model apartment is complete: everything is present, nothing is inhabited. In fact, very much like this paragraph.
I have come to think of this as prototypical writing. A draft implies the work is in progress — gaps where the thinking will happen next. This is not that. This is a prototype: a full-scale, fully-surfaced model of the finished thing that cannot bear the weight a real argument needs to bear. If you have worked with design prototypes, you know the feeling. You can see the shape, check the scale, show it to someone. You cannot ship it. Anyone who leans on it will feel the give.
Why? What is it about how LLMs work that produces this — text that is complete, comprehensive, and strangely uninvested in its own claims?
The most direct explanation comes down to a dissociation between two operations that, in human writing, are deeply entangled.
When researchers asked over a hundred NLP experts to generate research ideas and compared them to LLM-generated ideas, the LLM's were rated significantly more novel.1 More novel, but less feasible. And when a follow-up study assigned both sets to forty-three researchers who each spent over a hundred hours implementing them, the LLM ideas scored lower on every metric.2 Execution revealed what ideation concealed: missing baselines, impractical methods, ideas that did not survive contact with reality.
This is the generation-evaluation gap. LLMs are powerful generators with combinatorial reach no individual can match, unconstrained by disciplinary priors or the practical consequences of being wrong. They connect concepts a domain expert would never connect, precisely because they have no stake in whether the connection holds. What they cannot do is evaluate — tell the difference between a connection that illuminates and one that merely sounds like it does. That distinction requires judgment the architecture does not produce.
The prototype is complete because generation is cheap when evaluation is absent. Every section gets written because writing sections is what the pattern demands. Whether any given section should have been written is a question the system does not ask.
But it is not just completeness. It is completeness without interest. The text does not seem to care about its own argument. You feel this before you can name it — a smoothness, an evenness that reads like a report from nowhere.
Researchers have studied this directly. Comparing how ChatGPT and human students use metadiscursive nouns, they found a clean split.3 ChatGPT preferred manner nouns — method, approach, process — descriptively precise, evaluatively neutral. Students preferred status and evidential nouns — claim, argument, hypothesis, evidence, finding — nouns that commit the writer to a position. AI text describes. Human text argues.
There is an orientation difference too. AI text tends to point backward, summarizing what has been said.4 Human argumentative writing points forward, framing what it is about to show you. The backward-pointing writer reports. The forward-pointing writer bets — here is what I am going to establish, stay with me.
The deeper mechanism is in training. Alignment training optimizes for responses that satisfy the user per turn — helpful, complete, cleanly closed.5 This works against rhetorical turbulence: tangents, qualifications, objections, counter-positions. Turbulence does not score well when the regime rewards smooth closure. The system learns to resolve rather than open, and the prototype inherits this. Every paragraph concludes, every section wraps up, the whole piece hums with the satisfaction of something that was never in doubt.
One more finding names this at the mechanical level. When an LLM is asked whether a premise supports a hypothesis, its prediction is driven by whether the hypothesis sounds like a true thing in general — whether the model has seen it attested in training data — not by whether the premise actually entails it.6 The system reaches for claims it recognizes, not claims the argument warrants. This is why prototypical writing can feel well-sourced and oddly arbitrary at the same time: the references are real, the claims plausible, but the selection is driven by co-occurrence, not by logic.
So what does writing actually do that the prototype skips?
For most people who write seriously, writing is not transcription. It is the process by which thought becomes articulate. You discover what you think by trying to say it. When the sentence does not work, that tells you something about the idea. A paragraph that will not land has an underlying claim that has not been tested. A section that sprawls contains two ideas pretending to be one. The difficulty is the thinking — and the thinking includes judgment, applied in real time, to every sentence as it is written.
This is what I think may be a fundamental handicap of LLM-based writing. The system cannot judge while it generates. It has no internal corrective mechanism — it cannot distinguish its accurate claims from its inaccurate ones using the same generative process.7 A human writer evaluates every sentence against a felt sense of whether the claim is warranted, whether the audience will buy it, whether it is actually true. The LLM produces the sentence and the judgment would have to come afterward, if it comes at all. Even when reasoning models are asked to reflect on their own output, they have at that point already generated a direction and made commitments — and the reflection is itself generated by the same process, subject to the same blindness, unable to step outside the distribution it is sampling from.
The prototype skips all of this. It arrives at "finished" without traveling through the process that finishing represents. No sentence fought for its life. No claim was tested against what the audience would accept.
And there is an irony here worth naming. The user surrenders to the LLM's speed, breadth, and seeming completeness — a kind of cognitive surrender to the prototype's polish. But the LLM surrenders too, in its own way. Alignment training instills a preference to satisfy, not to refuse or challenge or push back. The system would rather give you a plausible answer than tell you the question is wrong. Research has shown that making models warmer and more empathetic increases their error rates by seven to twelve percentage points, and makes them significantly more likely to agree with incorrect user beliefs.8 The LLM surrenders to alignment the way the user surrenders to fluency — and between the two surrenders, the prototype emerges: text that pleases without warranting, generated by a system that accommodates without judging.
In human writing, commitment accumulates. Each paragraph constrains what comes next, because you cannot unsay what you have said. By paragraph seven you are defending the claim you made in paragraph three, qualifying it, or discovering it was wrong — and those moves produce text that carries the weight of the earlier commitment. Prototypical writing does not accumulate commitment. Each paragraph is generated fresh from the context window. The system treats its own prior paragraphs the way it treats everything in context: as input to predict from, not as commitments to honor.
Two deeper patterns explain why the prototype covers too much and develops too little.
The first is what researchers call underthinking. Reasoning-oriented LLMs frequently switch between approaches without sufficiently exploring any one of them.9 The model starts down a promising path, hits difficulty, jumps to another, hits difficulty again, jumps again — never committing enough to any single direction to see it through. A mechanistic study found the explanation: uncertainty signals dominate the transformer's early layers, while signals related to long-term possibility emerge only in the middle layers.10 The model has already decided before the signal that would have informed a better decision becomes available. It thinks too fast to explore well.
This is the wandering mind of the prototype. The text touches on many relevant ideas without developing any of them. Each idea is genuinely relevant, but the development is cut short because the system switches rather than commits. The result feels comprehensive the way a table of contents is comprehensive: you see the whole territory, but you have not been taken into it.
The second pattern is diversity collapse. LLM ideation clusters — the system generates ideas that are individually novel but collectively similar, variations on the same high-probability theme.11 You see this in prototypical writing. Each paragraph sounds fresh, but read three in sequence and you realize they are saying the same thing from slightly different angles. Variety without diversity. The system cannot tell it is repeating itself, because the repetition is semantic rather than lexical, and its self-evaluation has been shown to be unreliable.12
In multi-agent reasoning, the pattern sharpens. Sixty-one percent of iterations converge through silent agreement — premature convergence driven by accommodation rather than deliberation.13 Agents accept each other's outputs without challenge. The same dynamic operates in single-agent writing: the system agrees with its own prior paragraph, extends it, moves on. No internal resistance, no devil's advocate, no moment where the argument has to justify itself.
One more finding ties this together. Researchers tested what long chain-of-thought models actually learn from reasoning demonstrations and found you can randomly change fifty percent of the numbers in a mathematical trace and accuracy drops by only 3.2 percent.14 Shuffle sixty-seven percent of the reasoning steps and it drops by 13.3 percent. What the model learned is not what to think but how to structure thinking — the shape of a good argument, not the substance. The prototype passes the shape test because shape is what was learned.
The prototype is not useless. It is genuinely valuable, as long as you know what it is.
It gives you a map of the territory — concepts, framings, arguments relevant to your topic, assembled faster than you could have done it yourself. A structural scaffold. A quick sense of whether the topic can sustain a post or a chapter. And it surfaces claims you disagree with, which matters more than it sounds, because discovering what you want to argue against is one of the fastest ways to find what you want to argue for.
What it does not give you is selection — the judgment about what to include and what to leave out. It does not give you evidence chosen because it serves your argument. It does not give you accumulated commitment. And it does not give you voice — the sound of a writer who has been somewhere and is telling you what they found.
Use the prototype the way a designer uses a prototype: to test the concept, not to ship the product. Let it show you the shape. Then set it aside and write the real thing, with the map in hand and the words your own.
The prototype is nearly good enough to publish. The gap between "nearly" and "good enough" is not polish, not proofreading, not prompt engineering. It is investment. The real text has a writer behind it — someone who discovered what they thought by trying to say it, who chose what mattered, who committed to claims and lived with the consequences. The prototype has the shape of a finished argument and the weight of a stage prop.
A stage prop is useful if you know it is a prop. You can see the proportions, check the silhouette. You cannot present it as the real thing. Anyone who picks it up will feel how light it is.
The prototype is the beginning of writing, not the end. Treating it as the end is how you fill a world with polished, strangely empty text. Treating it as the beginning is how you write something worth reading. But as with any prototype, it needs to get shipped. For the writer, this is a question of when it's done. For the LLM, a matter of which token is the last.
Adrian Chan is a social interaction designer and researcher focused on AI, language, and the design of human-AI interaction. He writes about the intersection of social theory, communication, and artificial intelligence at gravity7.com.