AI SDK Internals
Why Vercel AI SDK internals deserve their own chapter — three-layer API, version pinning, reading order
Why this chapter exists
Zapvol’s agent engine is built on Vercel AI SDK. On the surface we only call three APIs:
new ToolLoopAgent(...), agent.stream(...), result.toUIMessageStream(...). But the details that actually determine
whether an agent behaves correctly are buried in these three layers’ internal state models and callback ordering.
The official docs alone don’t cut it. Four recurring questions in the community all trace back to missing this foundation:
- “I modified
messagesinprepareStep, why does the change persist on the next step?” (shared object references, not deep copies) - “Why did my
onStepFinish/onFinishfire twice?” (there are actually two same-named callbacks — one at the streamText layer, one at the UI stream layer) - “I didn’t set
stopWhen, why does my agent stop after one step?” (ToolLoopAgentandstreamTexthave different defaults — the fallback chain has a trap) - “Why does
messageMetadatarun so often in my UI layer?” (it’s invoked for every chunk)
None of these are bugs — they’re the SDK’s design semantics. Until you understand them, you’re debugging by trial and error. Once you do, each one becomes an engineering lever you can exploit.
The goal of this chapter: explain these mechanisms once, thoroughly, so you stop guessing when writing agents.
Version pinning
This chapter is based on ai@6.0.134 (Zapvol’s pinned version — see packages/backend/package.json). Every source
reference, line number, and callback signature maps to this version. A v7 migration page will be opened when there are
roadmap changes.
ai@^6
├── streamText / generateText ← low-level execution
├── ToolLoopAgent ← agent-style wrapper over streamText
└── createUIMessageStream / toUIMessageStream ← downstream UI consumption
Three-layer API map
Every “why does it work this way” explanation comes back to this map. Locating which layer a given parameter or callback belongs to answers half of all questions.
| Layer | API | Responsibility | Lifecycle |
|---|---|---|---|
| L1 | new ToolLoopAgent({ ... }) | Static agent config: model, instructions, tools, stopWhen, prepareStep, 6 callbacks | Constructed once, reused across stream() / generate() calls |
| L2 | agent.stream({ ... }) | Per-invocation params: messages, abortSignal, timeout — and overrides of the L1 same-named callbacks | Called once per invocation; forwards to streamText |
| L3 | result.toUIMessageStream({ ... }) | Transforms fullStream into a UI message stream (SSE event sequence); has its own independent messageMetadata / onFinish / onError | Called once per invocation; downstream transform, runs concurrently with L1/L2 callbacks |
Critical: L3’s onFinish and L1/L2’s onFinish are not the same thing — different firing time, different
payload, different purpose. Confusing them directly causes token-counting drift and misordered message persistence. Full
comparison in Lifecycle.
How to read the six pages
Read in the order “send-side skeleton → hooks → receive-side → end-to-end”. First nail down the timeline of a single
agent.stream() call (three layers × 12 callbacks); then drill down into L3’s two UI-stream paths, message references
inside the step loop, and the deepest per-step hook; hop to the receive side and see how useChat consumes the stream;
finally use “end-to-end” to stitch pages 1-5 together. Pages 1-4 are the send side, page 5 is the receive side, page 6
puts both sides into one picture.
| Order | Page | Which runtime layer |
|---|---|---|
| 1 | Runtime Lifecycle | The skeleton — 12 callbacks across the three API layers (L1 ToolLoopAgent / L2 streamText / L3 toUIMessageStream) on a single timeline; two-layer same-named callbacks; stopWhen / timeout defaults |
| 2 | UI Stream Orchestration | L3 deep-dive / send side — Lifecycle’s L3 only covers the transform path (result.toUIMessageStream()); this page adds createUIMessageStream({ execute })’s execute-driven path, the three writer methods, the custom data-* event protocol, transient semantics, and the full error-capture table |
| 3 | Message Reference Model | Message mechanics inside the step loop — the four message chains (initialMessages / responseMessages / stepInputMessages / prepareStepResult.messages) and their reference semantics |
| 4 | prepareStep Semantics | The deepest per-step hook — overridable fields, typical patterns, the mutate-vs-push trap |
| 5 | Client Consumption (useChat) | The receive side — @ai-sdk/react’s useChat turns the UI stream into React state; the four ChatStatus states, the four ChatTransport choices, UIMessage vs ModelMessage bridging, the three-way onFinish flag split, the client tool state machine, the two-sided resume protocol, 10 common traps |
| 6 | End-to-End Coordination | Both sides together — the full round-trip of one sendMessage (sequence diagram + each hop’s source location), the 25-type UIMessageChunk reference, implementation templates for the three two-sided protocols (abort / resume / error), SSE environment-layer traps |
After these six, you should be able to answer the four opening questions, plus the client-side ones (“why is status
stuck on submitted”, “why are messages still there after onError”, “why isn’t the tool call UI rendering”), and the
two-sided ones (“client stopped but server is still running”, “how does resume actually land”, “why is SSE stuck behind
the reverse proxy”). If not, come back and locate the answer on the map.
Not in scope
- Not an AI SDK tutorial — readers are assumed to know the basics of
streamText,generateText, andToolLoopAgent. For basics, see the official docs. - UI framework bindings: only
useChatis deep-dived (page 5 Client Consumption) — the other@ai-sdk/reacthooks (useCompletion,useObject) aren’t covered. Zapvol itself does not useuseChatfor the main conversation (custom React Query + SSE hook instead; Case study at the end of page 5) but does useuseCompletionfor the one-shot AI Assistant side panel. - Not covering providers (
@ai-sdk/anthropicetc.) — provider choice is a Zapvol-layer decision; see Agent Engine. - Not an SDK contribution guide — only discusses the internals an application-layer consumer needs to know.
Why this isn’t in the official docs
Vercel’s docs cover “how to use”; this chapter covers “why this way works”. A precise division:
| Official docs | This chapter |
|---|---|
| API signatures, parameter lists, code examples | Callback firing order, reference semantics, default fallback chains, counter-intuitive traps |
| Teaches you to write your first agent | Teaches you to debug your third-failing agent |
| Organized by topic (generating text / tools / streaming) | Organized by actual execution order (three layers × timeline) |
Further reading
- Zapvol Agent Engine — how Zapvol composes AI SDK into its own agent loop
- Context Compaction — three-tier compression built on
prepareStep - Tool Search — dynamic
activeToolsfiltering viaprepareStep - Vercel AI SDK docs — external reference