AI SDK Internals

Why Vercel AI SDK internals deserve their own chapter — three-layer API, version pinning, reading order

Why this chapter exists

Zapvol’s agent engine is built on Vercel AI SDK. On the surface we only call three APIs: new ToolLoopAgent(...), agent.stream(...), result.toUIMessageStream(...). But the details that actually determine whether an agent behaves correctly are buried in these three layers’ internal state models and callback ordering.

The official docs alone don’t cut it. Four recurring questions in the community all trace back to missing this foundation:

  • “I modified messages in prepareStep, why does the change persist on the next step?” (shared object references, not deep copies)
  • “Why did my onStepFinish / onFinish fire twice?” (there are actually two same-named callbacks — one at the streamText layer, one at the UI stream layer)
  • “I didn’t set stopWhen, why does my agent stop after one step?” (ToolLoopAgent and streamText have different defaults — the fallback chain has a trap)
  • “Why does messageMetadata run so often in my UI layer?” (it’s invoked for every chunk)

None of these are bugs — they’re the SDK’s design semantics. Until you understand them, you’re debugging by trial and error. Once you do, each one becomes an engineering lever you can exploit.

The goal of this chapter: explain these mechanisms once, thoroughly, so you stop guessing when writing agents.

Version pinning

This chapter is based on ai@6.0.134 (Zapvol’s pinned version — see packages/backend/package.json). Every source reference, line number, and callback signature maps to this version. A v7 migration page will be opened when there are roadmap changes.

ai@^6
├── streamText / generateText              ← low-level execution
├── ToolLoopAgent                           ← agent-style wrapper over streamText
└── createUIMessageStream / toUIMessageStream  ← downstream UI consumption

Three-layer API map

Every “why does it work this way” explanation comes back to this map. Locating which layer a given parameter or callback belongs to answers half of all questions.

LayerAPIResponsibilityLifecycle
L1new ToolLoopAgent({ ... })Static agent config: model, instructions, tools, stopWhen, prepareStep, 6 callbacksConstructed once, reused across stream() / generate() calls
L2agent.stream({ ... })Per-invocation params: messages, abortSignal, timeout — and overrides of the L1 same-named callbacksCalled once per invocation; forwards to streamText
L3result.toUIMessageStream({ ... })Transforms fullStream into a UI message stream (SSE event sequence); has its own independent messageMetadata / onFinish / onErrorCalled once per invocation; downstream transform, runs concurrently with L1/L2 callbacks

Critical: L3’s onFinish and L1/L2’s onFinish are not the same thing — different firing time, different payload, different purpose. Confusing them directly causes token-counting drift and misordered message persistence. Full comparison in Lifecycle.

How to read the four pages

Read skeleton-first, then specifics. First nail down the timeline of a single agent.stream() call (three layers × 12 callbacks); then drill down into L3’s two UI-stream paths, the message references inside the step loop, and the deepest per-step hook. Each page builds on the skeleton from the previous:

OrderPageWhich runtime layer
1Runtime LifecycleThe skeleton — 12 callbacks across the three API layers (L1 ToolLoopAgent / L2 streamText / L3 toUIMessageStream) on a single timeline; two-layer same-named callbacks; stopWhen / timeout defaults
2UI Stream OrchestrationL3 deep-dive — Lifecycle’s L3 only covers the transform path (result.toUIMessageStream()); this page adds createUIMessageStream({ execute })’s execute-driven path, the three writer methods, the custom data-* event protocol, transient semantics, and the full error-capture table
3Message Reference ModelMessage mechanics inside the step loop — the four message chains (initialMessages / responseMessages / stepInputMessages / prepareStepResult.messages) and their reference semantics
4prepareStep SemanticsThe deepest per-step hook — overridable fields, typical patterns, the mutate-vs-push trap

After these four, you should be able to answer the four opening questions. If not, come back and locate the answer on the map.

Not in scope

  • Not an AI SDK tutorial — readers are assumed to know the basics of streamText, generateText, and ToolLoopAgent. For basics, see the official docs.
  • Not covering UI framework bindings (@ai-sdk/react, useChat, useCompletion) — Zapvol uses custom React Query hooks to consume SSE, not useChat.
  • Not covering providers (@ai-sdk/anthropic etc.) — provider choice is a Zapvol-layer decision; see Agent Engine.
  • Not an SDK contribution guide — only discusses the internals an application-layer consumer needs to know.

Why this isn’t in the official docs

Vercel’s docs cover “how to use”; this chapter covers “why this way works”. A precise division:

Official docsThis chapter
API signatures, parameter lists, code examplesCallback firing order, reference semantics, default fallback chains, counter-intuitive traps
Teaches you to write your first agentTeaches you to debug your third-failing agent
Organized by topic (generating text / tools / streaming)Organized by actual execution order (three layers × timeline)

Further reading

Was this page helpful?