AI SDK Internals
Why Vercel AI SDK internals deserve their own chapter — three-layer API, version pinning, reading order
Why this chapter exists
Zapvol’s agent engine is built on Vercel AI SDK. On the surface we only call three
APIs: new ToolLoopAgent(...), agent.stream(...), result.toUIMessageStream(...). But the details that actually
determine whether an agent behaves correctly are buried in these three layers’ internal state models and callback
ordering.
The official docs alone don’t cut it. Four recurring questions in the community all trace back to missing this foundation:
- “I modified
messagesinprepareStep, why does the change persist on the next step?” (shared object references, not deep copies) - “Why did my
onStepFinish/onFinishfire twice?” (there are actually two same-named callbacks — one at the streamText layer, one at the UI stream layer) - “I didn’t set
stopWhen, why does my agent stop after one step?” (ToolLoopAgentandstreamTexthave different defaults — the fallback chain has a trap) - “Why does
messageMetadatarun so often in my UI layer?” (it’s invoked for every chunk)
None of these are bugs — they’re the SDK’s design semantics. Until you understand them, you’re debugging by trial and error. Once you do, each one becomes an engineering lever you can exploit.
The goal of this chapter: explain these mechanisms once, thoroughly, so you stop guessing when writing agents.
Version pinning
This chapter is based on ai@6.0.134 (Zapvol’s pinned version — see packages/backend/package.json). Every source
reference, line number, and callback signature maps to this version. A v7 migration page will be opened when there are
roadmap changes.
ai@^6
├── streamText / generateText ← low-level execution
├── ToolLoopAgent ← agent-style wrapper over streamText
└── createUIMessageStream / toUIMessageStream ← downstream UI consumption
Three-layer API map
Every “why does it work this way” explanation comes back to this map. Locating which layer a given parameter or callback belongs to answers half of all questions.
| Layer | API | Responsibility | Lifecycle |
|---|---|---|---|
| L1 | new ToolLoopAgent({ ... }) | Static agent config: model, instructions, tools, stopWhen, prepareStep, 6 callbacks | Constructed once, reused across stream() / generate() calls |
| L2 | agent.stream({ ... }) | Per-invocation params: messages, abortSignal, timeout — and overrides of the L1 same-named callbacks | Called once per invocation; forwards to streamText |
| L3 | result.toUIMessageStream({ ... }) | Transforms fullStream into a UI message stream (SSE event sequence); has its own independent messageMetadata / onFinish / onError | Called once per invocation; downstream transform, runs concurrently with L1/L2 callbacks |
Critical: L3’s onFinish and L1/L2’s onFinish are not the same thing — different firing time, different
payload, different purpose. Confusing them directly causes token-counting drift and misordered message persistence.
Full comparison in Lifecycle.
How to read the four pages
Read skeleton-first, then specifics. First nail down the timeline of a single agent.stream() call (three layers
× 12 callbacks); then drill down into L3’s two UI-stream paths, the message references inside the step loop, and the
deepest per-step hook. Each page builds on the skeleton from the previous:
| Order | Page | Which runtime layer |
|---|---|---|
| 1 | Runtime Lifecycle | The skeleton — 12 callbacks across the three API layers (L1 ToolLoopAgent / L2 streamText / L3 toUIMessageStream) on a single timeline; two-layer same-named callbacks; stopWhen / timeout defaults |
| 2 | UI Stream Orchestration | L3 deep-dive — Lifecycle’s L3 only covers the transform path (result.toUIMessageStream()); this page adds createUIMessageStream({ execute })’s execute-driven path, the three writer methods, the custom data-* event protocol, transient semantics, and the full error-capture table |
| 3 | Message Reference Model | Message mechanics inside the step loop — the four message chains (initialMessages / responseMessages / stepInputMessages / prepareStepResult.messages) and their reference semantics |
| 4 | prepareStep Semantics | The deepest per-step hook — overridable fields, typical patterns, the mutate-vs-push trap |
After these four, you should be able to answer the four opening questions. If not, come back and locate the answer on the map.
Not in scope
- Not an AI SDK tutorial — readers are assumed to know the basics of
streamText,generateText, andToolLoopAgent. For basics, see the official docs. - Not covering UI framework bindings (
@ai-sdk/react,useChat,useCompletion) — Zapvol uses custom React Query hooks to consume SSE, notuseChat. - Not covering providers (
@ai-sdk/anthropicetc.) — provider choice is a Zapvol-layer decision; see Agent Engine. - Not an SDK contribution guide — only discusses the internals an application-layer consumer needs to know.
Why this isn’t in the official docs
Vercel’s docs cover “how to use”; this chapter covers “why this way works”. A precise division:
| Official docs | This chapter |
|---|---|
| API signatures, parameter lists, code examples | Callback firing order, reference semantics, default fallback chains, counter-intuitive traps |
| Teaches you to write your first agent | Teaches you to debug your third-failing agent |
| Organized by topic (generating text / tools / streaming) | Organized by actual execution order (three layers × timeline) |
Further reading
- Zapvol Agent Engine — how Zapvol composes AI SDK into its own agent loop
- Context Compaction — three-tier compression built on
prepareStep - Tool Search — dynamic
activeToolsfiltering viaprepareStep - Vercel AI SDK docs — external reference