Runtime Lifecycle

The full timeline of a single agent.stream() call — 12 callback firing points across three layers, two-layer same-named callback comparison, stopWhen / timeout default chains

At a glance

A single agent.stream() call involves 3 API layers, 12 callbacks, 4 message chains, and 2 pairs of same-named callbacks. This page pins each one to the timeline.

Key numberValue
Total callbacks12 (deduplicated)
Same-named callback pairs2 (onStepFinish × 2 / onFinish × 2)
stopWhen default (L1)stepCountIs(20)
stopWhen default (L2 streamText)stepCountIs(1)
Timeout tiers3 (totalMs / stepMs / chunkMs)
Pinned SDK versionai@6.0.134

Three-layer capability matrix

Build the map before reading the timeline — for every parameter/callback, know which layer it belongs to:

L1: new ToolLoopAgent({...})L2: agent.stream({...})L3: result.toUIMessageStream({...})
RoleStatic config (define agent)Single invocation (trigger one run)Downstream consumer (transform result to UI stream)
LifecycleConstructed once, reusedCalled once per invocationCalled once per invocation
Structural paramsid, model, instructions, tools, experimental_context, providerOptionsmessages / prompt, abortSignal, timeoutoriginalMessages, generateMessageId, sendReasoning, sendSources, sendStart, sendFinish
Behavior hooksstopWhen, prepareStep, prepareCallexperimental_transform
Callbacks (firing order)experimental_onStartprepareStepexperimental_onStepStartexperimental_onToolCallStartexperimental_onToolCallFinishonStepFinishonFinishSame as L1 (merged with L1 same-named callbacks, settings fires first)messageMetadataonStepFinishonFinishonError

L1 and L2 same-named callbacks are merged: if both layers set onStepFinish, L1’s fires first, then L2’s (source: dist/index.js:8224-8232). L3’s same-named callbacks are entirely independent — they don’t merge with L1/L2 and have different payloads.

L3’s other entry point: this page’s L3 column focuses on the result.toUIMessageStream() transform path (passively consuming the agent’s result). L3 has another entry point — createUIMessageStream({ execute }) — the execute-driven path, which can actively push custom events and merge multiple streams alongside the agent’s. Both share the same underlying handleUIMessageStreamFinish (index.js:8100 / :8397), so onStepFinish / onFinish / onError fire at the same moments described on this page; but messageMetadata exists only on toUIMessageStream. The execute-driven path is covered in detail in UI Stream Orchestration.

Full timeline diagram

A single N-step agent.stream() call, time flows downward:

sequenceDiagram actor Caller participant L1 as ToolLoopAgent<br/>(settings) participant L2 as streamText<br/>(loop) participant L3 as toUIMessageStream<br/>(pipe) Caller->>L1: new ToolLoopAgent({ ... }) Note over L1: Stores settings only<br/>No callbacks fired Caller->>L1: agent.stream({ messages, ... }) L1->>L1: prepareCall(baseCallArgs) Note over L1: Upstream hook, called once<br/>per invocation (rare) L1->>L2: streamText(mergedArgs) L2-->>Caller: experimental_onStart() Note over L2: Once, globally loop N steps L2->>L2: stepInputMessages =<br/>[...initialMessages, ...responseMessages] L2-->>Caller: prepareStep({ messages, steps, stepNumber, model }) Note over L2: May return { messages, system,<br/>model, toolChoice, activeTools,<br/>experimental_context } L2-->>Caller: experimental_onStepStart() L2->>L2: Model stream begins<br/>"start-step" chunk L2->>L3: "start-step" L2->>L3: text-delta / reasoning-delta / ... loop per tool call L2-->>Caller: experimental_onToolCallStart() L2->>L2: tool.execute(input,<br/>{ abortSignal, experimental_context,<br/>messages, toolCallId }) L2-->>Caller: experimental_onToolCallFinish() end L2->>L2: "finish-step" chunk L2->>L3: "finish-step" L2-->>Caller: onStepFinish(stepResult) [L1/L2] Note over L2: payload: { stepNumber, content,<br/>toolCalls, toolResults,<br/>finishReason, usage, response } L3->>L3: each chunk through transform L3-->>Caller: messageMetadata({ part }) Note over L3: Fires per part — hot path<br/>No I/O here L3->>L3: "finish-step" chunk passes L3-->>Caller: onStepFinish [L3] Note over L3: payload: { responseMessage,<br/>messages, isContinuation } L2->>L2: isStopConditionMet?<br/>break or continue end L2->>L2: "finish" chunk<br/>aggregate totalUsage L2->>L3: "finish" L2-->>Caller: onFinish({ ... }) [L1/L2] Note over L2: payload: { finishReason,<br/>totalUsage, steps, content,<br/>response, request, warnings } L2->>L3: fullStream closes L3->>L3: flush() L3-->>Caller: onFinish({ ... }) [L3] Note over L3: payload: { responseMessage,<br/>messages, isContinuation,<br/>isAborted, finishReason } opt on any error L3-->>Caller: onError(error) end

Three critical observations:

  1. L1/L2 callbacks and L3 callbacks run concurrently — L2 pushes chunks to fullStream while L3’s pipe transforms them. So L1/L2 onStepFinish(n) and L3 onStepFinish(n) happen nearly simultaneously, but as independent event-loop tasks.
  2. L3 onFinish always fires later than L1/L2 onFinish — L3 is a downstream transform, it must wait for fullStream close + consumer drain before flushing. For “after-run” work: use L1/L2 onFinish for engine-side cleanup (token tallying, sandbox close), L3 onFinish for UI-side persistence (saving the assistant message).
  3. messageMetadata runs per chunk — including every text-delta and tool-input-delta. A long multi-tool response can easily emit 1000+ chunks; any synchronous I/O here directly stalls the stream.

Callback firing reference

In firing order. L1 = ToolLoopAgent settings, L2 = streamText (direct pass-through from agent.stream), L3 = toUIMessageStream.

#CallbackLayerWhenPayloadUse for
1prepareCall(baseCallArgs)L1Before each agent.stream() starts, after params mergeFull call args, returns overridesDynamic model/tools/stopWhen rewrite
2experimental_onStart()L1+L2After streamText starts, before first stepNoneInit logging/timing
3prepareStep({...})L1+L2Before each step’s model call{ messages, steps, stepNumber, model }Compaction, reminder injection, activeTools filtering, model switching
4experimental_onStepStart()L1+L2Before each step’s model stream (after prepareStep)NonePer-step timing marker
5experimental_onToolCallStart()L1+L2Before each tool.execute{ toolCall }Permission audit, pre-retry logic
6experimental_onToolCallFinish()L1+L2After each tool.execute{ toolCall, toolResult }Observability, cache writeback
7onStepFinish(stepResult)L1+L2Each step end, after finish-step chunk emittedStepResult: full step detailToken tallying, step-level persistence
8messageMetadata({ part })L3Each chunk passing through UI transform{ part } (current chunk)Attach metadata to UI control chunks
9onStepFinishL3Each finish-step chunk passing through UI transform{ responseMessage, messages, isContinuation }UI-side step-level persistence
10onFinish({...})L1+L2After all steps done, after finish chunk emitted{ finishReason, totalUsage, steps, ... }Engine-side settlement, cleanup
11onFinish({...})L3After UI stream drain / cancel{ responseMessage, messages, isContinuation, isAborted, finishReason }UI-side message persistence
12onError(error)L3UI transform error / error chunk / onStepFinish throwError or stringSSE error serialization; the returned string is written into the error chunk’s errorText field sent to the client

onFinish throws do NOT route here: callOnFinish (index.js:5927-5943) is a bare await with no try/catch. An onFinish throw propagates out through TransformStream’s flush(), rejecting the consumer iterator — it does not invoke onError. Any production onFinish must wrap its own try/catch/finally inside the callback. Full error-capture tiering in UI Stream Orchestration — Error capture, in full.

Same-named callbacks — the biggest trap

onStepFinish: L1/L2 vs L3

L1/L2 (streamText)L3 (UI stream)
WhenStep loop ends, after finish-step emitfinish-step chunk through UI transform
PayloadStepResult: { stepNumber, content, text, toolCalls, toolResults, finishReason, usage, response, request, ... }{ responseMessage: UIMessage, messages: UIMessage[], isContinuation }
What you seeEngine view: raw step output (tool call objects, usage breakdown)Consumer view: accumulated UI message (assistant message structure)
Use forToken tallying (billing), step-level logging, driving compaction / context trimmingIncremental UI message persistence, prefetching

onFinish: L1/L2 vs L3

L1/L2 (streamText)L3 (UI stream)
WhenAfter finish chunk emit, before fullStream closeAfter fullStream close + UI transform flush
Payload{ finishReason, totalUsage, steps, content, text, reasoningText, toolCalls, toolResults, response, request, warnings, providerMetadata }{ responseMessage, messages, isContinuation, isAborted, finishReason }
OrderingEarlier (upstream)Later (downstream drain)
Use forEngine-level one-shot settlement: write total usage to DB, close sandbox, commit compaction checkpointUI-level one-shot settlement: persist final assistant message, notify client of completion

L1/L2 onFinish closure trap: the L1/L2 onFinish payload carries the entire steps array — every step’s full StepResult (content, toolCalls, toolResults, request, response, all of it). If your callback closure captures this payload and pins it on a long-lived reference (e.g. storing it on an outer session object), the entire large object graph from this invocation is held and never GC’d. Long chat conversations amplify this — 20 steps of cumulative StepResult easily reaches hundreds of MB.

Practical guidance:

  • Short settlement logic (token counting, step logs) can live in L1/L2 onFinish — closure releases right after the callback returns
  • Long settlement logic (persistence, background jobs, checkpoint writes) should prefer L3 onFinish; its payload is the folded responseMessage + messages, orders of magnitude smaller
  • Or: use L1/L2 onStepFinish to incrementally collect only what you need into a small variable (numbers / strings / ids only, never the StepResult itself), then settle on that small variable in L3 onFinish

stopWhen default chain — the second trap

Same parameter name, different defaults at L1 and L2:

L1 new ToolLoopAgent({ stopWhen? })  default: stepCountIs(20)   ← dist/index.js:8210
L2 streamText({ stopWhen? })         default: stepCountIs(1)    ← dist/index.js:6452

ToolLoopAgent.stream() forwards L1’s stopWhen (default 20) to streamText, so the normal path caps the agent at 20 steps.

But if you call streamText(...) directly without setting stopWhen, your agent runs for exactly one step — one tool call and it halts. Classic beginner trap.

Built-in stop conditions (combinable: stopWhen: [stepCountIs(N), hasToolCall('complete')]):

FactorySemantics
stepCountIs(N)Stop after reaching step N
hasToolCall(name)Stop after a tool call with the given name

Typical production combo: [stepCountIs(N), hasToolCall('complete')] — the number is a hard ceiling (prevents the agent from spinning in a loop), hasToolCall('complete') is the “task finished” signal (lets the agent declare its own end). Pick N based on task complexity: simple Q&A 10-20, general assistant 30-50, deep research / multi-step editing 50-100.

Three-tier timeout — the third trap

timeout is an object with three granularities:

agent.stream({
  timeout: {
    totalMs: 600_000,   // Entire invocation: 10 min
    stepMs: 300_000,    // Single step: 5 min (model + tools combined)
    chunkMs: 120_000,   // Gap between two chunks: 2 min
  },
});

All three are independent. Any one tripping aborts via AbortSignal.timeout() (dist/index.js:6483-6495).

DimensionWatches forTypical scenario
totalMsAbsolute call durationLong-task total ceiling
stepMsSingle step from prepareStep to finish-stepModel response stalls
chunkMsGap between two adjacent chunksMid-stream hang (TCP half-open, slow provider thinking phase)

Tutorials usually only mention totalMs — but chunkMs is the lifesaver in production. A model stream can emit two lines then hang (TCP half-open from the provider, a thinking phase taking too long, etc.); totalMs is nowhere near, but chunkMs terminates it immediately. A solid default in production is chunkMs: 120_000 (2 minutes) — enough to cover long reasoning-model thinking gaps, but not so long that a real disconnect quietly hangs forever.

prepareCall — the little-known upstream hook

Besides prepareStep (per-step), L1 also has prepareCall (per-invocation):

new ToolLoopAgent({
  model, instructions, tools,
  prepareCall: async (baseCallArgs) => {
    // baseCallArgs = all merged args (settings + method options)
    // Return overrides — or return nothing to use baseCallArgs as-is
    return {
      ...baseCallArgs,
      tools: dynamicallyDecideTools(baseCallArgs.messages),
      stopWhen: stepCountIs(deriveStepLimit(user)),
    };
  },
});

When to use:

  • Dynamically swap tool sets (user tier / A-B test) without rebuilding the agent instance
  • Pick a model based on the call’s input
  • Set stopWhen per call

prepareCall vs prepareStep:

prepareCallprepareStep
LayerL1L1 or L2
Invocation countOnce per agent.stream()Once per step (N times per invocation)
Can overrideAll call args (tools, stopWhen, instructions, messages, model, …)Per-step args (messages, system, model, toolChoice, activeTools)
Use forStatic config dynamizationRuntime context adaptation (compaction, reminders, dynamic tool sets)

Picking between them: most projects only need prepareStep — runtime compaction, per-step reminder injection, swapping tool sets based on conversation length, these are all per-step scenarios. prepareCall makes more sense for deployments where the same agent instance is reused across requests (the agent is constructed once at module top level, and each HTTP request rewrites call args based on user tier / AB experiments). If your agent is reconstructed per request (the orchestrator layer already does config dynamization), prepareCall is redundant.

experimental_* callbacks — four underrated hooks

Officially prefixed experimental_, meaning the signatures might change; but in ai@^6 they’re stable and are core building blocks for observable agents:

HookUse forTypical landing scenarios
experimental_onStartCall-level “begin” markerEmit a UI init event, start the whole-call timer, log run start
experimental_onStepStartStep-level “begin” markerReset step counter, clear per-step buffer, start per-step timer
experimental_onToolCallStartTool-level “begin” markerAudit log, permission check, blocking validation
experimental_onToolCallFinishTool-level “done” markerResult caching, metric reporting, tool-level error routing

Difference from onStepFinish: onStepFinish is the aggregation callback at step end (full StepResult); experimental_onStepStart is the transition point at step start (no payload, pure signal). For “inter-step cleanup” use the former; for “step initialization” use the latter.

Further reading

Related SDK chapters

SDK source anchors (ai@6.0.134)

  • dist/index.js:6441-6532streamText entry
  • dist/index.js:6750-6810onStepFinish emission
  • dist/index.js:8180-8317ToolLoopAgent implementation
  • dist/index.js:7839-8108toUIMessageStream implementation

Zapvol landing reference

  • packages/backend/src/agent/agent-stream.ts — the assembly point for all-layer callbacks (L1/L2 settings, L3 toUIMessageStream params, stepUsages incremental-collection pattern)
  • packages/backend/src/agent/agent-factory.ts — how the stopConditions array is constructed
  • apps/server/src/services/task-orchestrator.ts — L3 onFinish used for assistant message persistence
Was this page helpful?