Context Compaction
Compaction works out of the box with no configuration. When your conversation fills the context window, Aura automatically compresses older messages into a summary, freeing space for new content. The default settings handle most use cases — tune only if needed.
How It Works
- Synthetic trim — At 50% context fill (configurable), duplicate synthetic messages are removed to delay compaction.
- Auto-compaction — At 80% context fill (configurable), compaction triggers automatically.
- Split — History is divided into messages to compact and messages to preserve (last 10 by default). Internal messages (DisplayOnly, Bookmark, Metadata) do not count toward the preserved message count.
- Preprocess — Messages to compact are stripped of synthetic, internal, and ephemeral messages, thinking blocks are removed, and tool results are truncated.
- Summarize — The compaction agent receives the preprocessed messages as structured conversation history and generates a summary that preserves structured data (requirements, checklists, decisions) while compressing narrative.
- Rebuild — History is reconstructed: system prompt + compaction summary + preserved messages.
What Gets Preserved
The compaction prompt is designed to preserve structured data through summarization:
- Requirement checklists and acceptance criteria — reproduced verbatim, not paraphrased
- File paths, package names, and dependency choices — specific identifiers that were decided or required
- Explicit decisions and rationale — only decisions that were stated in conversation, never inferred from errors or work-in-progress state
- Todo state — mechanically preserved (not LLM-generated) and appended to the compaction summary
When a requirement and a conversation action contradict each other (e.g., spec says “use pflag” but code switched to “flag”), the compaction notes both the requirement and the actual state rather than silently resolving the contradiction.
Manual Trigger
Use /compact to trigger compaction manually at any time.
Configuration
| Setting | Default | Description |
|---|---|---|
threshold | 80 | Context fill % that triggers auto-compaction |
max_tokens | 0 | Absolute token count that triggers compaction (overrides threshold when set) |
trim_threshold | 50 | Fill % for synthetic message trimming |
trim_max_tokens | 0 | Absolute token count that triggers trimming (overrides trim_threshold when set) |
keep_last_messages | 10 | Messages preserved during compaction |
chunks | 1 | Number of chunks for sequential compaction (1 = single-pass) |
agent | Compaction | Agent for generating summaries |
prompt | Named prompt for self-compaction (overrides agent) | |
tool_result_max_length | 200 | Max chars for tool results in compaction messages |
prune.mode | off | When to prune old tool results: off, iteration, compaction |
prune.protect_percent | 30 | % of context window to protect from pruning |
prune.arg_threshold | 200 | Min estimated tokens for tool call args to be prunable |
Threshold modes
Two threshold modes are available (mutually exclusive — max_tokens takes priority when set):
-
Percentage mode (default):
threshold: 80triggers compaction when token usage exceeds 80% of the context window. Requires a known context length (from the provider, registry, or the agent’scontext:field). -
Absolute mode:
max_tokens: 32000triggers compaction when the token count exceeds this value. Does not require knowing the context window size. Useful as a safety net when the provider doesn’t report context length, or for provider-independent thresholds.
The same applies to trim_threshold vs trim_max_tokens.
Context length
The agent’s context: field sets the effective context window size for compaction decisions. User-configured values always take priority over provider-reported values.
- Ollama: Sets the actual context window (
num_ctx) on the server AND uses it as the compaction denominator. - All other providers: The
context:field is used only as the compaction denominator — it does not affect the provider’s actual context window.
When context length is unknown (provider reports 0, no context: field, no max_tokens), compaction cannot function and an error is raised.
See Compaction Config for the full YAML.
Per-Agent Overrides
Agents can override compaction behavior via features.compaction in their frontmatter. Per-agent overrides are merged on top of the global defaults — no separate config surface needed.
Resolution Order
promptis set → self-compact: use the current agent’s own model with the named prompt fromprompts/. The dedicated compaction agent is bypassed entirely.agentis set → use that dedicated agent (its model, provider, and prompt).- Neither set → use the default agent from
compaction.yaml. - No agent or prompt configured at all → prune-only: mechanical pruning runs but LLM summarization is skipped. The conversation continues without compaction errors.
Examples
Self-compact with a prompt (agent uses its own model):
---
name: MyAgent
model:
provider: openrouter
name: anthropic/claude-sonnet
features:
compaction:
prompt: "Compaction"
---
Use a different dedicated agent:
---
name: LightAgent
features:
compaction:
agent: "FastCompactor"
---
Tweak thresholds only (keep default agent):
---
name: BigContextAgent
features:
compaction:
threshold: 95
keep_last_messages: 20
---
Advanced Configuration
Chunked Compaction
When chunks is set to N > 1, the compactable messages are split into N chunks and compacted sequentially. Each chunk’s summary feeds into the next chunk as context, producing a single coherent summary that preserves more detail than a single-pass compaction of the same history.
Chunk boundaries respect tool call/result pairs — a tool result is never separated from its corresponding assistant message.
Todo state is only included in the last chunk’s prompt to avoid contaminating intermediate summaries.
Progressive Retry
If compaction fails (e.g., the conversation is too large), it retries with progressively shorter tool result content:
200 chars → 150 chars → 100 chars → 50 chars → 0 chars
For chunked compaction, the retry wraps the entire sequence — if any chunk fails, all chunks are retried at the next truncation level.
Compaction Recovery
Both the auto-compaction trigger (at threshold) and context-exhaustion errors use the same recovery method: RecoverCompaction. If a single compaction attempt doesn’t free enough context (e.g., a large tool result is trapped in the preserved tail), it retries with progressively lower keep_last_messages:
keepLast = 10 → compact → still exceeds → keepLast = 9 → ... → keepLast = 0
If compaction succeeds but history length doesn’t shrink (ineffective), keepLast jumps straight to 0 to avoid wasting attempts.
At keepLast = 0, all messages are compacted and the preserved set is empty. The conversation is rebuilt as system prompt + compaction summary.
If keepLast = 0 still exceeds context (system prompt + summary alone are too large), a warning is displayed with current context state (token usage, context length, message count) and a suggestion to use /compact manually or start a new session. The error is then returned as nonrecoverable.
If no compaction agent is configured, recovery is not attempted — an immediate error explains that no agent is available.
Recovery retries do not count against max_steps.
Plugin Hooks
Two plugin hook timings interact with compaction:
BeforeCompaction fires before compaction begins (both auto-triggered and manual /compact). Plugins can skip built-in compaction by returning sdk.Result{Compaction: &sdk.CompactionModification{Skip: true}}. This lets a context-management plugin (e.g., DCP) take full control of context optimization without Aura’s compaction interfering. Context fields: Forced, TokensUsed, ContextPercent, MessageCount, KeepLast.
AfterCompaction fires after compaction completes (both success and failure). Plugins can observe what happened — log summaries, alert on failures, trigger external archival — but cannot modify the compaction result. Context fields: Success, PreMessages, PostMessages, SummaryLength.
See Plugins for the hook signatures.
When no compaction agent is configured (prune-only path), neither hook fires.
Pruning
Pruning removes low-value tool call arguments from older messages to reclaim context space before it fills up. Unlike compaction, pruning does not summarize — it only strips arguments that have already been consumed.
Three modes are available:
| Mode | Behavior |
|---|---|
off | Disabled (default) |
iteration | Prune after each tool-use loop iteration |
compaction | Prune during context compaction |
Only tool call arguments exceeding arg_threshold estimated tokens are candidates. The most recent messages, covering protect_percent of the context window, are never touched.
Configure under compaction.prune: in features/compaction.yaml:
compaction:
prune:
mode: "off" # "off", "iteration", "compaction"
protect_percent: 30 # % of context window to protect from pruning (most recent messages)
arg_threshold: 200 # min estimated tokens for tool call args to be prunable