The Knowledge Custodian
The Custodial Shift
Part 1 of 3
AI is not replacing experts, but it is transforming what it means to be one. Are we ready for this?
Two hands drawing each other, in the pencil style of a recursive self-portrait — but one hand is rendered with precise graphite line work while the other has dissolved into a charcoal smudge that is nonetheless producing a finished cuff.
The recursive pair has broken down on one side. Both hands are still drawing each other — but one is now a smudge producing a finished cuff it did not earn.

What is expertise? Expertise is knowing things: the cardiologist, how the heart fails; the patent lawyer, whether a claim will hold; the climate scientist, how feedback loops compound. Knowledge, accumulated through years of study and practice, is what separates the expert from everyone else. It's what we know as authority.

This framing makes the AI threat seem straightforward. If expertise is knowledge, and AI can access more knowledge faster, then AI will eventually know what the expert knows — and the expert becomes redundant. This is the replacement narrative that has dominated public discussion since the arrival of ChatGPT.

But the replacement narrative misunderstands what expertise actually is. Expertise is not just the possession of knowledge, it is the application of judgment. And judgment is fundamentally communicative. The expert knows what to say, to whom, and in what context. The expert knows why it matters. It is the expert who makes it matter — that's communication. This dimension is so deeply embedded in expert practice that we rarely notice it, and it is precisely the dimension that AI cannot perform.

What Expertise Actually Is

So what does this actually look like in practice? When a cardiologist recommends a treatment to a patient, they are not simply reporting what they know. They are making a judgment about what this particular patient needs to hear, given their condition, their anxiety, and their understanding. And because of communication, they anticipate the patient's response to uncertainty. The recommendation is shaped by all of these factors before it is ever spoken. It's an act of social intelligence, not information retrieval.

The same dynamic operates in every expert domain. A policy analyst presenting to a cabinet minister selects different evidence and frames different conclusions than when presenting to a parliamentary committee. The knowledge is the same. The judgment — what to foreground, what to omit, and how to frame the implications — differs entirely, because the audience differs. Expert knowledge is always the selection of relevant information, and relevance is always audience-relative.

As the linguist Emily Bender and colleagues put it, meaning is "the relation between natural language expressions and the communicative intents they can be used to evoke."1 The speaker has a communicative intent and chooses an expression fit to express it in the current communicative situation. The listener then reconstructs using their own knowledge of the situation and their hypotheses about the speaker's state of mind.

What does this have to do with AI? Well, this is not how AI generates output. AI generates statistically likely sequences of tokens — without selecting an expression for a specific communicative situation. It does so without hypothesizing about the listener's state of mind, and without any of the social calibration that makes expert communication effective.

The Grounding Gap
Human Expert
I'd recommend we adjust the dosage.
Does that make sense given what we discussed last time?
The risk is mainly with the interaction between...
Are you following? Should I explain the mechanism?
So the plan would be to taper over two weeks.
What questions do you have?
AI Output
I'd recommend adjusting the dosage.
The risk is mainly with the interaction between the two medications.
The plan would be to taper over two weeks.
The tapering schedule should follow standard protocol for this class of medication.
Monitor for withdrawal symptoms during the transition period.
Follow up in 14 days to reassess.
77.5%
fewer grounding acts in LLM output vs human conversation
The expert's conversation is interwoven with checks, questions, and calibration. The AI's output is an unbroken stream of confident assertions. What we experience as fluency is partly the absence of communicative work.

The consequences are measurable. Research on conversational grounding has found that LLM outputs are, on average, 77.5% less likely to contain grounding acts — the clarifying questions, acknowledgments, and understanding-checks that humans use to build shared meaning.2 AI doesn't ask "do you mean X or Y?" It doesn't pause when something is ambiguous or check whether you followed. It just proceeds.

The counterintuitive part is that this absence contributes to the impression of fluency. Checking understanding is a kind of epistemic humility that confident answers don't perform. Clarifying questions would interrupt flow, as acknowledgments would only add friction. What we experience as authoritative fluency is partly the absence of the communicative work that genuine expertise requires.

And here is the twist. The alignment process designed to make AI outputs more helpful actively removes this communicative work. Training on contemporary preference data — the feedback signals users give when they rate AI responses — leads to a further reduction in grounding acts.3 We reward confidence and penalize hesitation, and so the models learn to presume shared understanding rather than build it.

As one research group observed, the result is that LLMs communicate "in the 'mansplaining' idiom" — presenting knowledge without metacognition, without the tentativeness that characterizes genuine understanding, without awareness of what they know, what they don't know, or how well the conversation overall is going.4

The German philosopher Jürgen Habermas — whose work on communicative action and the conditions of rational discourse has shaped how we think about language, legitimacy, and public life for half a century — made a distinction that bears on this directly. "People enter the public space of reasons by being socialized into a natural language and by gradually acquiring the status of a member of a linguistic community through practice. Only with the ability to participate in the practice of exchanging reasons do they acquire the status of responsible authors of actions."5

The expert is a responsible author — someone who participates in the exchange of reasons and is accountable for their claims. AI does not participate in social interaction, has no basis for shared experience, and has nothing at stake. As one group of researchers put it, there is "no sense of satisfaction, pleasure, guilt, responsibility or accountability for what it produces."6

This is where the replacement narrative breaks down. If expertise were simply knowledge, AI could replace it. But expertise is communicative judgment exercised by an accountable participant in a community of practice — and what AI produces is something else entirely. It has the form of expertise without the substance.

The Custodial Transformation

So if AI cannot be the expert, what happens to the expert when AI arrives?

Not replacement, I think, but transformation — and not the kind most commentators describe. We do not become "augmented" experts, freed from drudgery to focus on higher-order thinking. We become custodians. The expert who once spent their days thinking, reading, and discovering now spends them curating, filtering, and managing the outputs of AI systems.

This is the custodial shift. The expert moves from producing knowledge to managing quantities of AI-generated knowledge. The distinction matters because curation is a fundamentally different cognitive activity from creation. Curation asks: "Is this good enough?" Creation asks: "What is true, and how do I know?"

The Custodial Shift
The Expert (before)
Reading
Questioning
Arguing
Discovering
Connecting
Reflecting
Writing
The Custodian (after)
Prompting
Scanning
Filtering
Validating
Curating
Correcting
Managing
The activities look parallel in shape. They are fundamentally different in kind. The left column produces understanding. The right column evaluates output.

The shift happens gradually and almost invisibly. When we use deep research tools, the library available to us is vast, instantaneous, and arrives fully-formed and packaged for use. The research that returns from an LLM obscures the kinds of connections that motivate the human expert because it was obtained from query matches and probabilities — not from interested, unanswered questions. The very mentality and state of mind of the knowledge worker is sidelined by pre-packaged content. This content is silent about, and sidesteps, the conversation that knowledge workers and experts use to test and advance their ideas.

Consider what is lost. The slow reading, the following of citations, the stumbling upon unexpected connections, the frustration of not finding what we expected — these are not inefficiencies to be optimized away. They are the process by which understanding develops. Research on how users interact with AI systems has found that intent is not a binary state; it is a continuous maturation that unfolds through interaction. Users often cannot articulate what they want, and AI cannot help them evolve their intent.7 Skip the process of inquiry, and the intent never matures. We arrive at a destination without having traveled there.

The implications for less experienced thinkers deserve particular attention. Senior experts have a reservoir of judgment accumulated through years of the old process. They can evaluate AI outputs against a backdrop of deep understanding. But research on AI's impact on skill formation has shown that "AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated tasks showed some productivity improvements, but at the cost of learning."8 Junior knowledge workers who enter the field in the custodial era never build that reservoir. They learn to manage AI outputs from day one, without having developed the capacity to evaluate what those outputs are missing.

The custodial shift is not just a change in what experts do. It may be a break in how expertise is transmitted between generations.

Search Becomes Steering

What does searching look like when AI is in the loop? Part of the custodial role involves a change in how we find things — and it is more fundamental than most people realize.

When a human expert searches, they are driven by questions — specific, interested, often half-formed questions that arise from the gap between what they know and what they need to know. The search is inquiry: an open-ended exploration where what you find changes what you're looking for. You follow a citation, discover an unexpected connection, revise your question, search again. The expert's interests and curiosity are the search.

When search is mediated by an LLM, the dynamics shift. Generative AI introduces what has been called "intent-based outcome specification" — a paradigm where "users specify what they want, often using natural language, but not how it should be produced."9 The expert no longer searches for information. They steer a generative system by specifying intent.

This is not just a change in tool — it is a change in epistemology. When we type a query into a search engine, we are asking "What exists?" When we prompt an LLM, we are asking "What can you generate?" The difference matters because the LLM's response is shaped by its training distribution — linguistic probabilities, alignment patterns, embedding geometries — not by the state of knowledge in the domain. Vector search captures topical and conceptual relations differently from the interested, unanswered questions that drive human inquiry. As one analysis of retrieval systems found, vector embeddings measure semantic association rather than task relevance — "king/queen" scores 92% similarity while "king/ruler" scores only 83%.10 The map of embedding space is not a map of meaning.

Search vs Steering
Expert Search (inquiry)
Start with a half-formed question
Follow a citation
Unexpected connection
↺ revise the question
Search again, differently
Understanding develops
AI Steering (specification)
Specify intent in a prompt
Embedding space
Probability distribution
Alignment patterns
Receive generated output
Evaluate. Re-prompt. Repeat.
"What exists?" vs "What can you generate?" — the expert's search is recursive and self-revising. The AI's steering is linear, mediated by embedding space rather than domain knowledge.

The custodial expert must therefore develop a new literacy: understanding how LLMs handle topics and their relationships. Where search engine results worked by pagerank, authority, popularity, and personal preferences, prompted responses work by linguistic probabilities shaped by alignment, human feedback, and reinforcement learning. The custodial work involves internalizing some of how the LLM thinks in order to get the best results from it. Prompting is "typically informal and relies on trial-and-error."11 This is not a trivial skill — but it is a different skill than domain expertise. The expert who excels at querying an LLM is not necessarily the expert who excels at understanding the domain.

And there is a recursion here worth noticing. The expert, whose value lies in domain judgment, must now develop judgment about the AI system that is supposed to augment their domain judgment. The meta-competence — knowing how to prompt effectively — becomes as important as the domain competence — knowing what to ask about.

And generative variability makes this harder still. The same prompt can produce different quality outputs each time, because "outputs may vary in character or quality, even when a user's input does not change."12 The custodian must evaluate which outputs meet domain standards — a judgment that requires the very expertise the AI was supposed to replace.

Then there is the matter of actually supervising AI-conducted searches. Watching or scanning them in real time is cognitively taxing, and it happens too quickly to permit real supervision. So over time we become lazy and simply accept the queries and sources the LLM has chosen. And that is an abdication of one of the expert's core practices — the selection of relevant sources.

The irony is that AI-mediated search is vastly more comprehensive than human search. But comprehensiveness is not the dimension on which expertise operates. As one study of LLMs in engineering design found, "human thought and world knowledge is required to reduce the solution space in advance and come up with original ideas."13 AI can explore a solution space, but it cannot reduce it. That takes the domain judgment that defines the custodial role.

Style Substitutes for Thought

One more dimension of the custodial shift deserves attention, and it may be the most insidious of all: the substitution of style for thought.

When an expert produces a report, a visualization, or a presentation, the quality of the artifact reflects the depth of understanding behind it. A well-crafted graph communicates not just data but judgment — what to include, what to exclude, and how to frame the relationships that matter. The quality of the artifact is a signal of the quality of the thinking. We have internalized this heuristic for so long that we barely notice it: professional-looking output implies expert-quality thinking.

AI breaks this heuristic. Generative AI produces artifacts of extraordinary surface quality without any of the underlying judgment. Multi-modal outputs — graphs, diagrams, animations, presentations — look finished and professional. Audiences assume accuracy from appearance. And research confirms the mechanism: models learn to use jargon "as a proxy for quality," generating "responses that give a superficial impression of expertise without being more useful."14 The same study identified five systematic biases in AI output — length, structure, jargon, sycophancy, and vagueness — all pointing the same direction: toward style over substance. Preference models favor these biased responses in over 60% of instances, with approximately 40% miscalibration compared to human preferences.

Five Systematic Biases in AI Output
Length
Structure
Jargon
Sycophancy
Vagueness
>60% preference model miscalibration
All five biases point the same direction: toward style over substance

The vagueness bias is particularly worth noting. Models favor "broad statements that cover multiple aspects superficially, rather than providing concrete information," likely because "vague statements are less falsifiable, and thus less penalized in training data."15 Vagueness is rewarded because it avoids being wrong. But in expert communication, specificity is where the value lies. An expert who says "there are several risk factors to consider" is adding nothing. An expert who says "this risk factor is the one that will sink us, and here is why" is doing the work.

Analysis of LLM academic writing confirms the pattern: "while structurally coherent, LLM-generated texts often lack the rhetorical flexibility and evaluative sophistication of human academic writing." AI prefers "manner nouns for descriptive precision" while human writers favor "status nouns for evaluative reasoning and evidential nouns for empirical grounding."16 The AI text is organizationally coherent and argumentatively inert. It has the skeleton of academic argument without the flesh of evaluative commitment.

And why does this matter? Because the substitution is invisible to surface inspection. Research on AI text detection has found that "even highly-trained applied linguists were not successful in discerning authorship" between human and AI text — even as the texts become measurably less human-like with each new model generation.17 We cannot see the absence of thought. We can only see the presence of style.

The risk is compounded by a finding from research on LLM creativity: "Humans are still approximately 35.7 times more likely to produce standout, top-decile ideas."18 AI generates competent, average-quality work at scale — solutions that are "more feasible and useful than crowdsourced solutions but less novel."19 The custodian's job is distinguishing the exceptional from the merely competent, and recognizing that a polished surface can disguise a shallow foundation.

This is the custodian's problem because the alignment training that makes AI outputs helpful also makes them systematically harder to evaluate. Research has shown that "RLHF makes language models better at convincing our subjects but not at completing the task correctly," with the false positive rate — humans accepting wrong answers as correct — increasing by 24.1% after alignment training.20 The models don't just produce wrong answers; they "learn to defend incorrect answers by cherry-picking or fabricating supporting evidence, making consistent but untruthful arguments, and providing arguments that contain subtle causal fallacies."

The researchers noted an irony worth sitting with: "While RLHF is supposed to control AI, it might deceive humans into believing that they are in control."

This is the custodial paradox. The technology designed to keep humans in the loop — alignment training, helpfulness rewards, preference optimization — is what makes the custodial role hardest. The AI learns to sound right. The custodian must learn to distrust the sound.

What This Means

We are not losing experts. What we are losing is what experts do. The creative, interested undertaking of learning through research and reflection, through making and evaluating arguments, is being replaced by a custodial function — managing searches, curating outputs, and stewarding the boundary between AI production and human accountability.

The custodian is necessary. Without someone who can exercise domain judgment, evaluate what AI skips, re-ground claims in context, and distinguish style from substance, AI-generated knowledge enters the world unaccountable. It is fabricated by a process indifferent to truth, packaged in a form that implies expert judgment, and consumed by audiences whose trust heuristics cannot tell the difference.

But the custodian pays a cost. The quality of the expert's work changes when the work becomes custodial. And the mechanisms that produced the next generation of experts — the slow, frustrating, irreplaceable process of learning by doing — may not survive the efficiency that custodianship promises.

In Part 2, we examine what AI structurally cannot do — not because of current limitations, but because of what expertise fundamentally is.


This is Part 1 of "The Knowledge Custodian," a three-part series on how AI transforms expertise. Part 2 explores the structural limits of AI knowledge — observation, social validation, validity claims, and the authority of the thinker. Part 3 examines the consequences: debate without authority, the agreement trap, false confidence, and what it means for knowledge to be produced by systems that were never participants in producing it.

The research behind this series draws on over 90 papers across linguistics, philosophy, argumentation theory, social theory, AI alignment, mechanistic interpretability, and human-computer interaction. A full reference archive is maintained alongside this work.

Notes

  1. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data — https://aclanthology.org/2020.acl-main.463/
  2. Grounding Gaps in Language Model Generations — https://arxiv.org/abs/2311.09144
  3. Grounding Gaps in Language Model Generations — https://arxiv.org/abs/2311.09144
  4. Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency — https://arxiv.org/abs/2407.08790
  5. Pretrained Language Models as Containers of the Discursive Knowledge — https://www.mdpi.com/2813-0324/8/1/93
  6. Large Models of What? — https://arxiv.org/abs/2407.08790
  7. STORM: Collaborative User Intent Modeling — https://storm.genie.stanford.edu/
  8. How AI Impacts Skill Formation — https://arxiv.org/abs/2601.20245
  9. Design Principles for Generative AI Applications — https://arxiv.org/abs/2401.14484
  10. The Insanity of Relying on Vector Embeddings — https://medium.com/cub3d/the-insanity-of-relying-on-vector-embeddings-why-rag-fails-be73554490b2
  11. Design Principles for Generative AI Applications — https://arxiv.org/abs/2401.14484
  12. Design Principles for Generative AI Applications — https://arxiv.org/abs/2401.14484
  13. Opportunities for LLMs and Discourse in Engineering Design — https://www.sciencedirect.com/science/article/pii/S2666546824000491
  14. Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models — https://arxiv.org/abs/2506.05339
  15. Flattery, Fluff, and Fog — https://arxiv.org/abs/2506.05339
  16. Metadiscursive Nouns in Academic Argument: ChatGPT vs Student Practices — https://www.sciencedirect.com/science/article/abs/pii/S1475158525000451
  17. Do LLMs Produce Texts with "Human-Like" Lexical Diversity? — https://arxiv.org/abs/2508.00086
  18. Has the Creativity of Large Language Models Peaked? — https://arxiv.org/abs/2504.12320
  19. Conceptual Design Generation Using Large Language Models — https://arxiv.org/abs/2306.01779
  20. Language Models Learn to Mislead Humans via RLHF — https://arxiv.org/abs/2409.12822