Context Compaction

Compaction works out of the box with no configuration. When your conversation fills the context window, Aura automatically compresses older messages into a summary, freeing space for new content. The default settings handle most use cases — tune only if needed.

How It Works

  1. Synthetic trim — At 50% context fill (configurable), duplicate synthetic messages are removed to delay compaction.
  2. Auto-compaction — At 80% context fill (configurable), compaction triggers automatically.
  3. Split — History is divided into messages to compact and messages to preserve (last 10 by default). Internal messages (DisplayOnly, Bookmark, Metadata) do not count toward the preserved message count.
  4. Preprocess — Messages to compact are stripped of synthetic, internal, and ephemeral messages, thinking blocks are removed, and tool results are truncated.
  5. Summarize — The compaction agent receives the preprocessed messages as structured conversation history and generates a summary that preserves structured data (requirements, checklists, decisions) while compressing narrative.
  6. Rebuild — History is reconstructed: system prompt + compaction summary + preserved messages.

What Gets Preserved

The compaction prompt is designed to preserve structured data through summarization:

  • Requirement checklists and acceptance criteria — reproduced verbatim, not paraphrased
  • File paths, package names, and dependency choices — specific identifiers that were decided or required
  • Explicit decisions and rationale — only decisions that were stated in conversation, never inferred from errors or work-in-progress state
  • Todo state — mechanically preserved (not LLM-generated) and appended to the compaction summary

When a requirement and a conversation action contradict each other (e.g., spec says “use pflag” but code switched to “flag”), the compaction notes both the requirement and the actual state rather than silently resolving the contradiction.

Manual Trigger

Use /compact to trigger compaction manually at any time.

Configuration

Setting Default Description
threshold 80 Context fill % that triggers auto-compaction
max_tokens 0 Absolute token count that triggers compaction (overrides threshold when set)
trim_threshold 50 Fill % for synthetic message trimming
trim_max_tokens 0 Absolute token count that triggers trimming (overrides trim_threshold when set)
keep_last_messages 10 Messages preserved during compaction
chunks 1 Number of chunks for sequential compaction (1 = single-pass)
agent Compaction Agent for generating summaries
prompt   Named prompt for self-compaction (overrides agent)
tool_result_max_length 200 Max chars for tool results in compaction messages
prune.mode off When to prune old tool results: off, iteration, compaction
prune.protect_percent 30 % of context window to protect from pruning
prune.arg_threshold 200 Min estimated tokens for tool call args to be prunable

Threshold modes

Two threshold modes are available (mutually exclusive — max_tokens takes priority when set):

  • Percentage mode (default): threshold: 80 triggers compaction when token usage exceeds 80% of the context window. Requires a known context length (from the provider, registry, or the agent’s context: field).

  • Absolute mode: max_tokens: 32000 triggers compaction when the token count exceeds this value. Does not require knowing the context window size. Useful as a safety net when the provider doesn’t report context length, or for provider-independent thresholds.

The same applies to trim_threshold vs trim_max_tokens.

Context length

The agent’s context: field sets the effective context window size for compaction decisions. User-configured values always take priority over provider-reported values.

  • Ollama: Sets the actual context window (num_ctx) on the server AND uses it as the compaction denominator.
  • All other providers: The context: field is used only as the compaction denominator — it does not affect the provider’s actual context window.

When context length is unknown (provider reports 0, no context: field, no max_tokens), compaction cannot function and an error is raised.

See Compaction Config for the full YAML.

Per-Agent Overrides

Agents can override compaction behavior via features.compaction in their frontmatter. Per-agent overrides are merged on top of the global defaults — no separate config surface needed.

Resolution Order

  1. prompt is set → self-compact: use the current agent’s own model with the named prompt from prompts/. The dedicated compaction agent is bypassed entirely.
  2. agent is set → use that dedicated agent (its model, provider, and prompt).
  3. Neither set → use the default agent from compaction.yaml.
  4. No agent or prompt configured at all → prune-only: mechanical pruning runs but LLM summarization is skipped. The conversation continues without compaction errors.

Examples

Self-compact with a prompt (agent uses its own model):

---
name: MyAgent
model:
  provider: openrouter
  name: anthropic/claude-sonnet
features:
  compaction:
    prompt: "Compaction"
---

Use a different dedicated agent:

---
name: LightAgent
features:
  compaction:
    agent: "FastCompactor"
---

Tweak thresholds only (keep default agent):

---
name: BigContextAgent
features:
  compaction:
    threshold: 95
    keep_last_messages: 20
---

Advanced Configuration

Chunked Compaction

When chunks is set to N > 1, the compactable messages are split into N chunks and compacted sequentially. Each chunk’s summary feeds into the next chunk as context, producing a single coherent summary that preserves more detail than a single-pass compaction of the same history.

Chunk boundaries respect tool call/result pairs — a tool result is never separated from its corresponding assistant message.

Todo state is only included in the last chunk’s prompt to avoid contaminating intermediate summaries.

Progressive Retry

If compaction fails (e.g., the conversation is too large), it retries with progressively shorter tool result content:

200 chars → 150 chars → 100 chars → 50 chars → 0 chars

For chunked compaction, the retry wraps the entire sequence — if any chunk fails, all chunks are retried at the next truncation level.

Compaction Recovery

Both the auto-compaction trigger (at threshold) and context-exhaustion errors use the same recovery method: RecoverCompaction. If a single compaction attempt doesn’t free enough context (e.g., a large tool result is trapped in the preserved tail), it retries with progressively lower keep_last_messages:

keepLast = 10 → compact → still exceeds → keepLast = 9 → ... → keepLast = 0

If compaction succeeds but history length doesn’t shrink (ineffective), keepLast jumps straight to 0 to avoid wasting attempts.

At keepLast = 0, all messages are compacted and the preserved set is empty. The conversation is rebuilt as system prompt + compaction summary.

If keepLast = 0 still exceeds context (system prompt + summary alone are too large), a warning is displayed with current context state (token usage, context length, message count) and a suggestion to use /compact manually or start a new session. The error is then returned as nonrecoverable.

If no compaction agent is configured, recovery is not attempted — an immediate error explains that no agent is available.

Recovery retries do not count against max_steps.

Plugin Hooks

Two plugin hook timings interact with compaction:

BeforeCompaction fires before compaction begins (both auto-triggered and manual /compact). Plugins can skip built-in compaction by returning sdk.Result{Compaction: &sdk.CompactionModification{Skip: true}}. This lets a context-management plugin (e.g., DCP) take full control of context optimization without Aura’s compaction interfering. Context fields: Forced, TokensUsed, ContextPercent, MessageCount, KeepLast.

AfterCompaction fires after compaction completes (both success and failure). Plugins can observe what happened — log summaries, alert on failures, trigger external archival — but cannot modify the compaction result. Context fields: Success, PreMessages, PostMessages, SummaryLength.

See Plugins for the hook signatures.

When no compaction agent is configured (prune-only path), neither hook fires.

Pruning

Pruning removes low-value tool call arguments from older messages to reclaim context space before it fills up. Unlike compaction, pruning does not summarize — it only strips arguments that have already been consumed.

Three modes are available:

Mode Behavior
off Disabled (default)
iteration Prune after each tool-use loop iteration
compaction Prune during context compaction

Only tool call arguments exceeding arg_threshold estimated tokens are candidates. The most recent messages, covering protect_percent of the context window, are never touched.

Configure under compaction.prune: in features/compaction.yaml:

compaction:
  prune:
    mode: "off"           # "off", "iteration", "compaction"
    protect_percent: 30   # % of context window to protect from pruning (most recent messages)
    arg_threshold: 200    # min estimated tokens for tool call args to be prunable

Back to top

Copyright © 2026 idelchi. Distributed under the MIT License.