Compaction

Why Compaction Matters

Long-running agents fill their context windows. Without compaction they either crash at the hard limit or degrade gradually (context rot) as the window grows past the model’s effective working range. Compaction is the operation that replaces some of the window’s content with a more compact representation so the agent can keep working.

Every other technique in this section — caching, just-in-time loading, sub-agent isolation — reduces how much context enters the window. Compaction is what you do when it enters anyway.

Design Decisions at a Glance

Seven decisions shape a compaction design, roughly in this order:

Do you need compaction? — usage profile. Short tasks often don’t.
Which tier(s) of the spectrum? → § The Compression Spectrum
Trigger strategy — fixed threshold, autonomous, task-boundary, or hybrid? → § Trigger Strategies
Preservation policy — what survives verbatim, what gets compacted, what drops? → § Preservation Policy
Custom instructions — how to encode the policy as prompt text. → § Tuning With Custom Instructions
Situation-specific extensions — multi-agent, caching, recovery? → § Design Extensions
Measurement — how do you verify it works? → § Measuring Compaction

Read the rest of this page in order; each section builds on the previous.

The Compression Spectrum

Compaction is not one operation — it is a spectrum of techniques with sharply different cost-fidelity tradeoffs. Pick the lightest one that works; escalate only when it doesn’t.

Technique	What it does	Fidelity loss	Compute cost
Tool result clearing	Drop content of already-consumed tool outputs	Lossless if the agent moved past it	None
Tool result truncation	Cap tool output at N chars; keep head + tail	Lossy — middle detail gone	None
Round-level replacement	Replace older turns with a pre-computed summary	Lossy — summary captures gist	Paid once, at write
Full-conversation summarization	LLM re-reads whole conversation, writes a new starting point	Most lossy — narrative end-to-end compressed	Full LLM call

Which Tier When

Which tiers you enable depends on how long and how tool-heavy your agent is:

Short conversations, modest tool use — tier 1 (clearing) alone is often enough.
Long conversations, moderate tool use — tiers 1 + 2 (clearing + truncation).
Long conversations, dense tool output — tiers 1 + 2 + 3 (add round-level replacement).
Very long sessions that repeatedly approach the context limit — all four tiers, with full summarization as last resort.

Starting with the lightest tiers and escalating when needed is cheaper than always running full summarization. But don’t wait until the last moment: heavier tiers need room to work, and triggering at 98% leaves little slack for the summarizer itself.

Selective vs Uniform Truncation

Truncating every tool output at the same character count is cheap but wasteful. A 50-line file read can safely be kept verbatim; a 50,000-line database dump needs aggressive trimming. Tune per tool: let each tool declare its own truncation policy (keep head, keep tail, keep summary).

The Recall-First, Precision-Second Recipe

Start by maximizing recall — ensure the compaction prompt captures every relevant piece of information. Then iterate to improve precision — eliminate redundant tool outputs and messages. — Effective Context Engineering for AI Agents, Anthropic, 2026

Build your compaction so it errs on the side of preserving too much. Once you see what the agent actually re-reads post-compaction, trim. Doing this in reverse — starting tight, loosening only when you notice loss — fails because you won’t notice the loss until a user complains weeks later.

Trigger Strategies

The hardest question is not how to compress but when. Four strategies cover the space.

Fixed Threshold

Compaction fires when total tokens cross a fixed ratio of the context window (commonly 60–85%) or an absolute value (Anthropic’s API defaults to 150,000 tokens, minimum 50,000).

Strengths: predictable, easy to reason about, no model judgment involved.
Weaknesses: may trigger at a bad moment — mid-reasoning, mid-tool-chain, or right when the agent was about to finish. The agent loses continuity because the trigger was token-count-based, not task-aware.

Autonomous Triggering

The agent decides when to compress. LangChain’s Deep Agents and Claude Code both offer this shape. The agent typically fires compaction at:

Task transitions — after completing a sub-goal
Post-extraction — right after extracting results from a long document or tool output
Pre-ingestion — before pulling in a large new context
Multi-step boundaries — before starting a refactor, migration, or analysis
Strengths: compression happens at moments where little state is lost. The agent picks the gap between logical units of work.
Weaknesses: model judgment varies. Agents can under-compress (wait too long, hit the ceiling) or over-compress (compress so often the summary-of-summary degrades). Needs tuning.

Task-Boundary Compaction

The harness, not the agent or a ratio, declares compaction points. Each workflow stage ends with compaction; the summary becomes the input to the next stage. Pipelines, multi-agent handoffs, and structured workflows use this.

Strengths: compaction is part of the architecture, not an emergency response. No surprise triggers, no judgment calls, clean seams between stages.
Weaknesses: only works if the workflow has natural seams. Open-ended agent loops don’t.

None (Crash at the Limit)

Worth naming because many early agents had this — no compaction, context fills, model errors or silently truncates. The only mitigation is “keep conversations short”. Acceptable for short-lived agents; unacceptable for anything that runs past a few turns.

Hybrid Is the Production Default

Most production systems combine fixed threshold as safety net + autonomous or task-boundary as primary. The threshold catches cases where the primary didn’t fire in time; the primary avoids the worst-moment triggers that fixed thresholds are famous for.

Preservation Policy

Once you know which tier and when it fires, the substantive design question is: what survives? This decision affects agent coherence post-compaction more than any other choice.

Three tiers of preservation:

Always Preserve Verbatim

The user’s current-task turn (most recent N rounds, typically 3–10)
Currently-open files with their latest state
In-flight tool calls that haven’t returned yet
The system prompt and memory index — these aren’t part of the compacted region, but naming them reminds you not to accidentally compact them

Preserve as Compact References

Decisions that constrain future behavior — keep the decision, drop the deliberation
Files that were read but not modified — keep the paths, not the content
User intent — the original ask and key clarifications, compressed but prominent
Architectural commitments — technology choices, interface contracts
Unresolved issues — bugs found but not fixed, open questions

Drop Freely

Tool output that was read and synthesized into a decision
Agent’s own intermediate reasoning that led to a committed decision
Exploration branches that were abandoned
Routine acknowledgments — “OK, let me check…”
Superseded decisions — older versions of a plan overwritten by newer ones

Where the Line Sits

The boundary between “always preserve” and “compact reference” is where most compaction bugs live. A safe heuristic:

Anything the user would visibly notice the loss of → verbatim tier
Anything the agent needs to remember but can reconstruct if needed → compact reference
Everything else → droppable

Tuning With Custom Instructions

Your preservation policy is what survives. Custom instructions are the how — the actual prompt text that encodes the policy for the summarizer LLM. Default compaction prompts optimize for a generic conversation and often get your use case wrong: dropping architectural decisions, keeping routine tool exchanges, forgetting user-stated preferences.

Every serious compaction system exposes a way to replace or augment the default prompt:

Anthropic API — instructions parameter completely replaces the default prompt
Claude Code — /compact "focus on the recent database migration" appends user guidance
LangChain Deep Agents — middleware-level configuration of the summarization prompt
OpenAI Agents SDK — custom summarizer function in the session configuration

Writing Good Custom Instructions

Translate each preservation tier from the previous section into prompt text:

Verbatim tier — list explicitly: “Preserve the original user request verbatim. Keep the last N rounds unchanged. Preserve current file-edit state.”
Compact-reference tier — describe the compression: “Summarize architectural decisions as one sentence each. List unresolved issues with one line per issue. Keep file paths even when content is compressed away.”
Drop tier — state what to discard: “Omit routine acknowledgments, abandoned exploration branches, and tool outputs that have been synthesized into committed decisions.”

Keep the prompt short and prescriptive. Long summarization prompts with many nested rules tend to confuse smaller summarizer models; a clean bullet list of what-to-keep and what-to-drop performs better than paragraphs of guidance.

Verify With Replay

Before trusting a custom compaction prompt in production, test it by replay: take a known-good long task, force compaction at varying points, continue execution from the compacted state, and check whether the continued work matches the uncompacted baseline. Failures here are the signal that the prompt is dropping something it shouldn’t.

This is the same loop formalized in § Measuring Compaction — mentioned here because it is the verification step for every change you make to custom instructions, preservation policy, or trigger strategy.

Design Extensions

The basic design (spectrum + triggers + preservation + custom instructions) handles single-agent setups without caching or long-horizon recovery requirements. The three extensions below apply when your situation has the specific conditions named — not every agent needs them.

Multi-Agent Coordination

Applies when: multiple agents share context (common conversation, shared state) or their compaction decisions affect each other.

Each agent has its own context window. Compaction design has to decide: does each agent compress independently, or is there a coordinator?

Distributed compaction (common default) — each agent compacts independently. Simple, no coordinator needed. Works well when agents have mostly independent contexts (main delegates to sub-agent, sub-agent returns summary).

Downside: if agents share substantial context, each duplicates the compression work and may compress to slightly different summaries, causing drift.

Centralized compaction (AutoGen’s pattern) — one coordinator compresses the shared conversation and broadcasts. AutoGen’s CompressibleGroupManager is the published example.

Upside: single source of truth. All agents agree on what was said and what it means.
Downside: requires a coordinator role; becomes a bottleneck under load.

Guidance: independent agents → distributed. Shared conversation (collaborative editing, shared thread) → centralized. Parent-child with clean handoff → distributed (parent sees only the child’s summary anyway).

Prompt Cache Integration

Applies when: you’re using prompt caching (you probably are — caching is typically 10× cheaper on hits).

Compaction and caching interact subtly. Caches hit only on a stable prefix. Every compaction event replaces content, which can invalidate prefixes.

Three patterns to respect:

Protect the cached head. The most stable content (tools, system prompt, durable examples) lives before the compacted region. Compaction replaces history after the head, not the head itself.
Cache the summary itself. Anthropic’s API lets you place cache_control on the compaction block — the summary text gets cached on write, so the next call reads it cheaply.
Don’t write compaction output before the cached region. If the implementation prepends compaction output as “new context”, every compaction event breaks caching. Compaction must write after the cached prefix.

Resumability and Re-Compaction

Applies when: your agent runs long enough that failure recovery matters, or a single conversation may be compacted multiple times.

Compaction is lossy. If execution fails after compaction but before task completion, can you recover?

Three mechanisms:

Pre-compaction logging — before compaction fires, log the full pre-compaction state to durable storage. On failure, reload and try a different strategy.
Compaction checkpoints — the compaction output includes a reference to the pre-compacted state. Resumption loads the summary but keeps the reference for re-expansion if needed.
Parallel recovery channels — an independent, always-appended artifact (memory, notes, auto-memory) captures key decisions outside the compacted region. Claude Code’s Auto Memory is an instance.

Designing for re-compaction: at some point the compacted summary itself will be re-compacted (summary-of-summary). Each pass loses fidelity. Anticipate:

Cap the compaction count on any given conversation; beyond N, start a new session with explicit hand-off.
Preserve a “core identity” region that never gets re-summarized — user intent, architectural decisions.
Weight quality metrics differently for older, multi-compressed summaries.

Measuring Compaction

Compaction is high-risk: lossy operations on long-running state. It rewards careful measurement more than most context-engineering topics.

Five signals to track:

Signal	What it tells you
Compression ratio	Input tokens / summary tokens. Too high = over-compression. Too low = wasted work.
Trigger precision	Fraction of compactions that were actually needed. Low = triggering too early.
Post-compaction regression	Force compaction mid-task and replay. Tasks that fail only post-compaction identify what the compactor should preserve.
Summary-of-summary degradation	How does fidelity drop on the Nth re-compaction? If steep, cap the re-compaction count.
Compaction cost amortization	Summary call cost ÷ turns-until-next-compaction. Helps decide whether each tier is worth its compute.

Synthetic Regression Suite

The single most valuable test: keep 10–20 long tasks that historically succeeded. Force compaction at fixed and variable points. Replay them. Flag any regression.

Run this suite whenever you change compaction logic — custom instructions, trigger strategy, preservation policy. It catches more problems than any other measurement approach.

Cross-Framework Reference

A survey of how the major frameworks expose compaction. Useful for calibration against industry practice; not a prescription.

Framework	Trigger style	Preservation	Customization	Multi-agent	Recovery
Anthropic API	Fixed threshold (configurable)	Automatic or manual `pause_after`	`instructions` parameter	Per-conversation	`compaction` block + cache_control
Claude Code	Auto + `/compact` command	Recent turns + auto memory	`/compact "focus on ..."`	Per-session	Auto Memory parallel channel
LangChain Deep	Autonomous (model decides)	Recent 10% + middleware rules	Summarization middleware	Per-agent	Virtual filesystem of history
OpenAI Agents SDK	Trim (drop) or summarize	Last N turns verbatim	Custom summarizer function	Per-session	Session store
CrewAI	`respect_context_window`	`summarize_messages()` chunks	Limited	Per-agent	Shared memory class
AutoGen	Per-manager	Shared conversation compression	Group manager configuration	Centralized (unique)	Delegated to coordinator

Design Bets Each Framework Represents

No framework is “best” — they reflect different design choices:

AutoGen bets on coordinated multi-agent — worth it when agents share context heavily
LangChain bets on agent judgment — worth it for autonomous agents with variable workloads
OpenAI bets on simplicity — two options (trim / summarize), clear mental model
Anthropic bets on configurable server-side primitives — infrastructure, not policy
CrewAI bets on one-setting simplicity — opinionated defaults, no choice paralysis
Claude Code bets on developer-in-the-loop — /compact with user instructions, less autonomy

If you’re choosing a framework with compaction in mind, name which bet matches your agent shape first; the rest follows.

← Overview — Return to the section hub.
Context Management — The rest of the runtime discipline: attention budgets, just-in-time context, sub-agent isolation, checkpointing. Compaction fires when those techniques hit their limits.
Memory Design — Structured notes and memory complement compaction: what persists outside the window doesn’t need to be compressed, only retrieved.
From Case to Paradigm → Part 2 — “Where the Method Stops Scaling” names when compaction becomes a design-essential rather than an emergency response.

Sources

Compaction — Anthropic, Claude API docs (beta compact-2026-01-12)
Automatic context compaction — Anthropic, Claude Cookbook
Autonomous Context Compression — LangChain, 2026
Context Engineering for Deep Agents — LangChain Docs
Context Engineering — Short-Term Memory Management with Sessions — OpenAI Agents SDK Cookbook
Memory — CrewAI Concepts
Memory and RAG — AutoGen
Effective Context Engineering for AI Agents — Anthropic, 2026