prepareStep Semantics
Full signature of the per-step hook, 7 overridable fields, 4 typical patterns, and a deep analysis of the mutate-vs-push trap
Why prepareStep is the most critical hook
Almost all context engineering at agent runtime happens inside prepareStep:
- Compacting long contexts (trigger summary / tiered trimming once token threshold is hit)
- Injecting system-reminders (ephemeral constraints, not in the conversation history)
- Dynamically filtering tool visibility (
activeToolsfiltering — show the model only a small candidate subset each step) - Forcing tool usage (
toolChoice: "required"to prevent premature exit) - Dynamically switching models (cheap model for planning, strong model for execution)
- Applying cache control breakpoints (Anthropic prompt cache)
It’s also the easiest hook to mis-use — get the reference semantics wrong and you pollute the whole conversation. This page systematically covers the full API signature and common patterns, closing with a real mutate-vs-push trap breakdown.
Prerequisite: read Message Reference Model first. This page assumes you know
the difference between stepInputMessages and prepareStepResult.messages.
Full signature
type PrepareStepCallback = (options: {
model: LanguageModel; // Model for this step (overridable per invocation)
steps: StepResult[]; // Completed historical steps (doesn't include current)
stepNumber: number; // Current step index (0-based)
messages: ModelMessage[]; // stepInputMessages — see Message Reference Model
experimental_context: unknown; // Context passed through layers (business-layer-defined shape)
}) => Promise<{
model?: LanguageModel; // Switch model for this step
system?: string; // Override system prompt for this step
messages?: ModelMessage[]; // Messages sent to model for this step (only)
toolChoice?: ToolChoice; // Tool selection strategy for this step
activeTools?: string[]; // Tool visibility list for this step
providerOptions?: ProviderOptions;// Per-step provider options (merged with L1/L2's providerOptions)
experimental_context?: unknown; // Override context for this step (rare)
}> | undefined;
Return undefined or no return: the SDK uses defaults for this invocation (stepInputMessages, settings-level
model, activeTools, toolChoice, etc.).
Return partial fields: the SDK overrides only those fields; others keep their defaults (streamText path
dist/index.js:7195-7220; generateText equivalent at 4322-4340).
7 overridable fields
| Field | Overrides | Typical use | Scope of override |
|---|---|---|---|
model | This step’s LLM | Light/heavy split, fallback | This step only |
system | This step’s system prompt | A/B prompts, context-based persona | This step only |
messages | Messages sent to the model | Compaction, reminder injection, filtering | This step only |
toolChoice | Tool selection policy | "required" to force tool call, "none" to disable, { type: "tool", toolName } to force specific | This step only |
activeTools | Visible tools for this step | Tool search pool narrowing, hiding dangerous tools by context | This step only |
providerOptions | Provider-specific options (merged with L1/L2’s providerOptions, not replaced — dist:7219) | Dynamically shift Anthropic cache_control breakpoints, adjust reasoning / thinking budget per step, swap OpenAI seed per step | This step only |
experimental_context | Context seen by downstream tool.execute | Per-step state passing (rare — usually set once at L1) | This step only |
(no tools) | — | Tool set is fixed at L1/L2; prepareStep can filter but not add | — |
Critical constraint: prepareStep cannot add new tools — only filter from the fixed ToolSet at L1/L2 via
activeTools. If you need “per-step dynamic tool set”, either register everything at L1 and filter with
activeTools, or use prepareCall to swap the entire tools (see
Lifecycle → prepareCall).
4 typical patterns
Pattern 1: Reminder injection (ephemeral context)
You want to give the model a temporary “system hint” every step (e.g. “no emojis”, “prefer internal tools”) without polluting the conversation history:
prepareStep: async ({ messages, experimental_context }) => {
const ctx = experimental_context as MyContext;
if (ctx.reminders.length === 0) return undefined;
const remindersText = ctx.reminders
.map(r => `<system-reminder>\n${r}\n</system-reminder>`)
.join('\n');
// OK — append a new message; visible this step, gone next step
return {
messages: [
...messages,
{ role: 'user', content: `[system directive — not user input]\n${remindersText}` },
],
};
},
Why this works: see Message Reference Model row 3 — push is the only “auto-disappears next step” pattern.
Pattern 2: Tool narrowing (big toolset → small candidate)
The “tool search pool” pattern — when an agent is wired to MCP or a dynamic tool-discovery source, the available tool set may be hundreds; but each step the model should only see a small subset relevant to the current context (lower token cost, better selection accuracy):
prepareStep: async ({ experimental_context }) => {
const ctx = experimental_context as MyContext;
const pool = ctx.toolDiscovery;
if (!pool?.active) return undefined;
return {
activeTools: pool.getActiveToolNames(), // e.g. ["read_file", "grep", "tool_search"]
};
},
Why not use tools: tools is registered at L1 (the full set); activeTools is a name filter — the SDK screens
by name in prepareToolsAndToolChoice (dist/index.js:4330-4334), zero cost.
Pattern 3: Force tool usage (prevent premature exit)
Scenario: the agent has a todo list, and must absolutely not produce a final answer while items are pending:
prepareStep: async ({ experimental_context }) => {
const ctx = experimental_context as MyContext;
const hasPendingTodos = ctx.todos?.some(t => t.status !== 'done') ?? false;
if (hasPendingTodos) {
return { toolChoice: 'required' }; // Force a tool call this step
}
return undefined;
},
When to use toolChoice alternatives:
'auto'(default): model chooses freely'required': must call a tool (which tool is up to the model)'none': disable tools (pure generation){ type: 'tool', toolName: 'finish' }: force calling the specified tool
Typical uses: 'required' for “pending todos block exit”, { type: 'tool', toolName: 'complete' } at a subagent’s
finalize phase to force result delivery, 'none' for “final summarization step must not call tools anymore”.
Pattern 4: Dynamic model switching
prepareStep: async ({ model, messages, stepNumber }) => {
// First 3 steps use a cheap model for planning, then switch to a strong model
if (stepNumber < 3) {
return { model: haikuModel };
}
return { model: sonnetModel };
},
Caveat: switching models does NOT switch tool schemas — tool definitions live at L1. Some providers are sensitive to tool schema format (OpenAI vs Anthropic). Before switching models, ensure the tool schema is compatible on both sides.
The mutate-vs-push trap (the canonical anti-pattern)
This is the direct application of Message Reference Model to prepareStep. Below is
a real production case — the original version carried the pollution bug, and was later refactored to the push pattern:
Anti-example (polluting version)
// Anti-pattern: mutating objects inside initialMessages
prepareStep: async ({ messages }) => {
const remindersText = ctx.getRemindersText();
const lastMessage = messages[messages.length - 1];
if (lastMessage.role === 'user') {
// WRONG — direct mutation permanently pollutes this message in initialMessages
lastMessage.content += `\n${remindersText}`;
} else {
// OK — push a new message, gone next step
messages.push({ role: 'user', content: remindersText });
}
return { messages };
},
What’s wrong:
lastMessageis a reference tostepInputMessages[k]— the same object as the user message ininitialMessages.lastMessage.content += ...creates a new string and assigns it to.content— but the field is overwritten on the original object, which still sits ininitialMessages.- After step 1 pollutes it, step 2/3/4 each rebuild
stepInputMessages = [...initialMessages, ...responseMessages]and spread in this polluted message — the reminder is always there. - Worse, each step re-appends the reminder — accumulating exponentially.
Correct version (uniform push)
prepareStep: async ({ messages }) => {
const remindersText = ctx.getRemindersText();
if (!remindersText) return undefined;
// OK — regardless of whether last is user, push a new message
return {
messages: [
...messages,
{ role: 'user', content: `[system directive — not user input]\n${remindersText}` },
],
};
},
Why this works:
- A new user message object is created each step; visible this step only.
- Next step’s
stepInputMessagesrebuild is pristine[...initialMessages, ...responseMessages]; the new message vanishes. - No accumulation risk, no shared-reference pollution.
One-sentence verdict
Never mutate the fields of any message object you receive from prepareStep’s messages parameter.
| Operation | Verdict |
|---|---|
msg.content += 'x' | Pollutes |
msg.content.push(part) | Pollutes |
msg.metadata = {...} | Pollutes |
msg.role = 'system' | Pollutes |
return { messages: [...messages, newMsg] } | Safe |
return { messages: messages.filter(...) } | Safe (new array, no element mutation) |
return { messages: messages.map((m, i) => i === N ? {...m, content: 'new'} : m) } | Safe (new object replaces target object) |
Step-transition timeline (focused view)
Performance notes
prepareStep runs before every step’s model call and is blocking — the model call won’t start until your Promise
resolves.
Common performance pitfalls:
- Sync I/O (
fs.readFileSync, sync DB queries) — directly blocks the event loop. - Token counting (
@anthropic-ai/tokenizeror tiktoken) — first load takes ~100ms for model weights; counting every step adds up quickly. Practical fix: cache the prior round’s token estimate and only recount the message segments that changed. - LLM calls (for summarization during compaction) — a single LLM compaction call takes seconds; doubles your per-step wait time. Practical fix: separate “decide whether to compact” (cheap sync by token estimate, only when threshold is hit) from “actually compact” (only then goes to LLM).
Rule of thumb: keep prepareStep’s total time (sync + async) under 200ms. Any higher and each step adds
perceptible latency — a 20-step task adds 4+ seconds.
Division of labor vs prepareCall
| Scenario | Use prepareCall (once per call) | Use prepareStep (once per step) |
|---|---|---|
| Pick model by user tier | Use: decide once at call start | Overkill: recomputes each step |
| Dynamically switch model by context | Can’t: don’t know at call start | Use: decide each step from steps / messages |
| Inject system-reminder | Can’t: reminder is runtime state | Use: correct scenario |
| Compact messages | Can’t: one-shot at call start, no reaction to growth | Use: compact only when token threshold hit |
| Replace entire tool set | Use: prepareStep can’t add tools | Can’t: only filters, doesn’t add |
Further reading
Related SDK chapters
- Message Reference Model — the theoretical foundation of this page
- Runtime Lifecycle — prepareStep’s position in the full lifecycle
- UI Stream Orchestration — the other face of L3
Zapvol landing reference
- Context Compaction — three-tier compaction built on
prepareStep - Tool Search — dynamic
activeToolsfiltering viaprepareStep packages/backend/src/agent/agent-stream.ts— assembly location for theprepareStepimplementation