Feature Configuration

Features are configurable capabilities with their own YAML config files in .aura/config/features/. Features that need LLM calls use a dedicated hidden agent defined in .aura/config/agents/features/.

Feature Agent Resolution

Several features delegate work to an LLM. Four features — Compaction, Thinking, Guardrail, and Title — use the ResolveAgent() framework, which supports two resolution patterns:

  1. agent: — Uses a dedicated hidden agent with its own model/provider. Separate API call. Better quality, uses a purpose-built prompt.
  2. prompt: — Reuses the current agent’s model with a named system prompt from prompts/. No extra provider config needed. Cheaper (same model, single call context).

Resolution Order

For Compaction, Thinking, Guardrail, and Title, the resolution follows a 2-tier order:

  1. prompt: set → self-use with current model + named prompt
  2. agent: set → create dedicated agent on demand (its model, provider, prompt)

If neither is set, behavior depends on the feature: Compaction falls back to prune-only, Title falls back to first user message, others return an error.

Agents are created on demand — no caching. agent.New() runs each time the feature fires (during spinner display, off the critical path).

Code: internal/assistant/resolve.go.

Which Features Support Which Pattern

Feature agent: prompt: Resolution Notes
Compaction Yes Yes ResolveAgent() Falls back to prune-only if neither configured
Thinking Yes Yes ResolveAgent() Only runs when thinking blocks cross keep_last boundary
Guardrail Yes Yes ResolveAgent() Per-scope: scope.tool_calls.agent/prompt, scope.user_messages.agent/prompt
Title Yes Yes ResolveAgent() Falls back to first user message if neither configured
Vision, STT, TTS, Embeddings, Reranker Yes No Config-only Agent name provides model/provider — no runtime resolution

Compaction, Thinking, Guardrail, and Title use the ResolveAgent() framework with prompt/agent duality. Vision, STT, TTS, Embeddings, and Reranker store an agent name in config for model/provider reference but have no runtime resolution logic — they are configured, not resolved.

When to Use Which

For features that support both patterns (Compaction, Thinking, Guardrail, Title):

  • Use agent: when you want a small local model (e.g., Ollama) for cheap internal tasks while using a cloud model for conversation.
  • Use prompt: when running on a cloud provider and you want to avoid extra API calls — the current model handles the feature in-context.
  • Use prompt: with Ollama when your local model is powerful enough to handle both conversation and internal tasks.

Compaction

File: features/compaction.yaml

Automatically compresses conversation history when the context window fills up.

compaction:
  # Context fill percentage that triggers auto-compaction (1-100).
  threshold: 80

  # Absolute token count that triggers compaction (overrides threshold when set).
  # max_tokens: 32000

  # Fill percentage at which duplicate synthetic messages are removed.
  # Delays compaction by trimming injected messages first. Must be < threshold.
  trim_threshold: 50

  # Absolute token count that triggers trimming (overrides trim_threshold when set).
  # trim_max_tokens: 16000

  # Number of most recent messages to preserve during compaction.
  keep_last_messages: 10

  # Number of chunks for sequential compaction.
  # 1 = single-pass. N > 1 splits history into N chunks,
  # compacts each sequentially (each chunk's summary feeds into the next).
  chunks: 1

  # Agent for generating the compaction summary.
  agent: Compaction

  # Named prompt for self-compaction (overrides agent).
  # When set, uses the current agent's model with this prompt instead
  # of delegating to a separate agent. Mirrors thinking's self-rewrite pattern.
  # prompt: ""

  # Max characters for tool results in the compaction transcript.
  tool_result_max_length: 200

  # Retry sequence with progressively shorter tool results on failure.
  truncation_retries: [150, 100, 50, 0]

  # Tool result pruning to keep context lean.
  prune:
    mode: "off" # "off", "iteration", "compaction"
    protect_percent: 30 # % of context to protect from pruning
    arg_threshold: 200 # min tokens for tool call args to be prunable

Embeddings

File: features/embeddings.yaml

Embedding-based codebase search with optional reranking.

embeddings:
  # Agent for generating text embeddings.
  agent: "Embeddings"

  # Default number of search results.
  max_results: 5

  # Gitignore-style patterns controlling which files get indexed.
  # Uses standard gitignore syntax.
  gitignore: |
    *
    !*/
    !src/**/*.go

  # Explicitly load/unload the embedding model between pipeline stages.
  # Frees VRAM after embedding so reranking can load its own model.
  # Useful on VRAM-constrained devices. No-op if provider doesn't support it.
  offload: false

  chunking:
    # How files are split for embedding: auto, ast, line.
    # auto = AST for supported languages (tree-sitter), line for everything else.
    strategy: "auto"

    # Target maximum tokens per chunk.
    max_tokens: 500

    # Overlap tokens between adjacent chunks.
    overlap_tokens: 75

  reranking:
    # Agent for reranking results after similarity search.
    # Set to "" to skip reranking.
    agent: "Reranker"

    # Fetch max_results * multiplier candidates, then rerank down.
    multiplier: 4

    # Explicitly load/unload the reranker model around reranking.
    # Frees VRAM after reranking completes.
    offload: false

Thinking

File: features/thinking.yaml

Extended thinking/reasoning block management.

thinking:
  # Agent for rewriting thinking blocks.
  # Only used when an agent has thinking: rewrite in its frontmatter.
  agent: "Thinking"

  # Named prompt for self-rewrite (overrides agent).
  # When set, uses the current agent's model with this prompt instead
  # of delegating to a separate agent. Mirrors compaction's self-compact pattern.
  # prompt: ""

  # Number of most recent messages whose thinking blocks are always preserved.
  keep_last: 5

  # Minimum token count for a thinking block to be affected by strip/rewrite.
  # Blocks below this threshold are left alone (small thinking blocks are cheap to keep).
  token_threshold: 300

Thinking handling is set per-agent in agent frontmatter:

  • thinking: "" — keep thinking blocks as-is (default)
  • thinking: "strip" — strip thinking from older messages that exceed the token threshold, honoring keep_last
  • thinking: "rewrite" — condense older thinking via a dedicated agent or self-rewrite

Agent resolution follows a three-tier pattern (same as compaction):

  1. prompt: set → self-rewrite with current model
  2. agent: set → use dedicated Thinking agent
  3. Neither → use pre-built agent (error if not configured)

Vision

File: features/vision.yaml

LLM-based image analysis and text extraction.

vision:
  # Agent for vision calls (must support vision).
  agent: "Vision"

  # Max pixel dimension for image compression.
  dimension: 1024

  # JPEG compression quality (1-100).
  quality: 75

Speech-to-Text (STT)

File: features/stt.yaml

Audio transcription via a whisper-compatible server.

stt:
  # Agent for transcription calls (provider must support /v1/audio/transcriptions).
  agent: "Transcribe"

  # ISO-639-1 language hint (e.g. "en", "de", "ja"). Empty = auto-detect.
  language: ""

Text-to-Speech (TTS)

File: features/tts.yaml

Speech synthesis via an OpenAI-compatible TTS server.

tts:
  # Agent for synthesis calls (provider must support /v1/audio/speech).
  agent: "Speak"

  # Default voice identifier (depends on TTS server — e.g. "alloy" for OpenAI, "af_heart" for Kokoro).
  voice: "alloy"

  # Default output audio format: mp3, opus, aac, flac, wav, pcm.
  format: "mp3"

  # Default playback speed (0.25-4.0).
  speed: 1.0

Title

File: features/title.yaml

Automatic session title generation.

title:
  # Set to true to skip LLM title generation.
  disabled: false

  # Agent for title generation (dedicated model/provider).
  agent: "Title"

  # Named prompt for self-title (overrides agent — uses current model).
  prompt: ""

  # Maximum character length for titles.
  max_length: 50

Title uses the same ResolveAgent() framework as Compaction and Thinking: prompt: uses the current model with a named system prompt, agent: creates a dedicated agent on demand. If neither is set, falls back to using the first user message as the title (no LLM call).

Tools

File: features/tools.yaml

Tool execution guards and limits.

tools:
  # Guard mode: "tokens" (fixed limit) or "percentage" (context-fill based).
  mode: percentage

  # Tool result size guard.
  result:
    # Max estimated tokens for a single tool result (mode: tokens).
    max_tokens: 20000
    # Max context-fill percentage after adding result (mode: percentage).
    max_percentage: 95

  # Token threshold below which Read tool returns full file.
  read_small_file_tokens: 2000

  # Maximum total iterations before hard stop.
  max_steps: 50

  # Cumulative token limit (input + output). Once reached, the assistant stops.
  # 0 = disabled (default). Override per-run with --token-budget.
  token_budget: 0

  # Message returned when a tool result is rejected.
  rejection_message: >-
    Error: Tool result too large (%d tokens, limit %d).
    Try a more specific query or use offset/limit parameters.

  # Bash tool configuration.
  bash:
    # Output truncation.
    # Byte cap: stdout/stderr each capped at max_output_bytes during capture (prevents OOM).
    # Line truncation: output exceeding max_lines is middle-truncated (first head + last tail lines).
    # Full output is saved to a temp file referenced in the truncation message.
    truncation:
      max_output_bytes: 1048576 # 1MB per stream; 0 = disabled
      max_lines: 200
      head_lines: 100
      tail_lines: 80
    # Go text/template applied to every Bash tool command before execution.
    # Receives {{ .Command }} (original command) and sprig functions.
    # Empty = no rewrite.
    rewrite: ""

  # Max response body size in bytes for WebFetch tool (default: 5MB).
  webfetch_max_body_size: 5242880

  # Enable parallel execution of independent tool calls within a single LLM turn.
  # Omit or set true = parallel (default). false = sequential.
  # parallel: false

  # Max context-fill percentage after adding a user message.
  # Prevents massive input from causing unrecoverable context exhaustion.
  # 0 = disabled.
  user_input_max_percentage: 80

  # Read-before enforcement policy. Toggle at runtime with /readbefore or /rb.
  read_before:
    write: true    # require read before overwriting existing files (default: true)
    delete: false  # require read before deleting files (default: false)

  # Default tool include patterns (glob). Empty = all tools.
  # Equivalent to --include-tools CLI flag. CLI flags override this.
  enabled: []

  # Default tool exclude patterns (glob). Empty = none excluded.
  # Equivalent to --exclude-tools CLI flag. CLI flags override this.
  disabled: ["mcp__*"]

  # Tools hidden unless explicitly named in an enabled list.
  # The bare "*" wildcard does NOT satisfy opt-in — only named references do.
  opt_in: []

  # Glob patterns for tools to defer (e.g., ["Vision*", "Speak", "mcp__github*"]).
  # Deferred tools are excluded from request.Tools and instead listed in a
  # lightweight prompt block. The LoadTools meta-tool loads them on demand,
  # and once loaded they persist for the remainder of the session.
  deferred: []

Sandbox

File: features/sandbox.yaml

Filesystem sandbox via Landlock, restricting the tools the LLM can access.

sandbox:
  # Global enforcement toggle.
  enabled: true

  # Filesystem restriction lists (support $HOME expansion).
  restrictions:
    ro: []   # read-only paths
    rw: []   # read-write paths

  # Additional paths additive on top of restrictions.
  # Merged from agent and mode frontmatter features.sandbox.extra.
  # extra:
  #   ro: []
  #   rw: []

Toggle at runtime with /sandbox or /landlock.

The sandbox has two runtime states: requested (what the user/config wants) and enabled (whether Landlock is actually enforcing). These can differ when the kernel does not support Landlock — requested is true but enabled is false. Both states are available in prompt templates as {{ .Sandbox.Requested }} and {{ .Sandbox.Enabled }}.

Subagent

File: features/subagent.yaml

Configuration for the subagent runner — a one-shot mini tool loop with isolated conversation context used by the Task tool.

subagent:
  # Maximum tool-use iterations for a single subagent run.
  max_steps: 25

  # Default agent used when no agent is specified in the Task tool call.
  # Empty = use the parent (calling) agent.
  default_agent: ""

Feature resolution: All subagents resolve their own features independently — the merge chain is global → child agent → child mode, not the parent’s merged features. This means a subagent with features.subagent.max_steps: 50 uses that value regardless of what the parent agent sets. When no agent name is specified, it resolves to default_agent, then falls back to the parent’s agent name — but always creates a fresh, isolated instance.

Plugins

File: features/plugins.yaml

Configuration for Go plugin loading.

plugins:
  # Plugin directory relative to config home. Empty = "plugins/".
  dir: ""

  # Allow loading plugins compiled without safety checks.
  # Set to true only if you trust all plugins in .aura/plugins/.
  unsafe: false

  # Load only matching plugin names. Empty = load all.
  include: []

  # Skip matching plugin names. Applied after include.
  exclude: []

  # Plugin configuration values (3-layer merge: plugin.yaml → global → local).
  # Merge order: plugin.yaml defaults (lowest) → global → local (highest).
  config:
    # Key-value pairs sent to ALL plugins via sdk.Context.PluginConfig.
    global: {}
    # Per-plugin overrides keyed by plugin name.
    local: {}
    # local:
    #   failure-circuit-breaker:
    #     max_failures: 5
    #   todo-reminder:
    #     interval: 10
    #   session-stats:
    #     interval: 10
    #     top_tools: 5

MCP

File: features/mcp.yaml

Server-level filtering for MCP connections. Same glob pattern system as tool filtering.

mcp:
  # Default include patterns (glob). Empty = connect all enabled servers.
  # Equivalent to --include-mcps CLI flag. CLI flags override this.
  enabled: []

  # Default exclude patterns (glob). Empty = skip none.
  # Equivalent to --exclude-mcps CLI flag. CLI flags override this.
  disabled: []

CLI flags: --include-mcps "context7,git*", --exclude-mcps "portainer".

Guardrail

File: features/guardrail.yaml

Secondary LLM validation of tool calls and user messages. Disabled by default. Each scope (tool_calls, user_messages) is independently configurable with its own agent or prompt.

guardrail:
  # "block" (reject), "log" (notice + proceed), "" (disabled, default).
  mode: ""

  # Error policy — what happens when the guardrail check itself fails
  # (timeout, network error, model unavailable).
  # "block" = fail-closed (default when mode: block).
  # "allow" = fail-open (default when mode: log).
  # Omit to inherit from mode.
  # on_error: ""

  # Max duration per check. Default: 2m when enabled.
  timeout: 2m

  # Independent scopes — a scope is active when agent or prompt is set.
  scope:
    tool_calls:
      agent: ""        # dedicated guardrail agent name
      prompt: ""       # OR: named prompt for self-guardrail (uses current model)
    user_messages:
      agent: ""        # dedicated guardrail agent name
      prompt: ""       # OR: named prompt for self-guardrail (uses current model)

  # Filter which tools trigger guardrail checks (tool_calls scope only).
  tools:
    enabled: []        # glob patterns — only check matching tools (empty = all)
    disabled: []       # glob patterns — skip matching tools (applied after enabled)

See Guardrail feature for detailed behavior, response protocol, and error handling.

Estimation

File: features/estimation.yaml

Token estimation method used for context-fill calculations and tool result guards.

estimation:
  # Estimation algorithm: "rough", "tiktoken", "rough+tiktoken", "native".
  # rough        = chars / divisor
  # tiktoken     = tiktoken encoding
  # rough+tiktoken = max of both (default)
  # native       = provider tokenization (Anthropic, OpenAI, Google, Ollama, LlamaCPP)
  method: "rough+tiktoken"

  # Chars-per-token divisor for rough estimation.
  divisor: 4

  # Tiktoken encoding name.
  encoding: "cl100k_base"

Override Precedence

Feature values can be overridden at multiple levels. Each level merges non-zero values on top of the previous:

  1. Global defaultsfeatures/*.yaml files (+ ApplyDefaults() for omitted fields)
  2. CLI flags--max-steps overrides tools.max_steps before agent resolution
  3. Agent frontmatterfeatures: block in agent .md files (see Agents)
  4. Mode frontmatterfeatures: block in mode .md files (see Modes)
  5. Task definitionfeatures: block in task YAML (see Tasks)

Only non-zero values override — omitting a field preserves the value from the previous level. This means an agent can override compaction.threshold without affecting compaction.keep_last_messages.


Back to top

Copyright © 2026 idelchi. Distributed under the MIT License.