Feature Configuration
Features are configurable capabilities with their own YAML config files in .aura/config/features/. Features that need LLM calls use a dedicated hidden agent defined in .aura/config/agents/features/.
Feature Agent Resolution
Several features delegate work to an LLM. Four features — Compaction, Thinking, Guardrail, and Title — use the ResolveAgent() framework, which supports two resolution patterns:
agent:— Uses a dedicated hidden agent with its own model/provider. Separate API call. Better quality, uses a purpose-built prompt.prompt:— Reuses the current agent’s model with a named system prompt fromprompts/. No extra provider config needed. Cheaper (same model, single call context).
Resolution Order
For Compaction, Thinking, Guardrail, and Title, the resolution follows a 2-tier order:
prompt:set → self-use with current model + named promptagent:set → create dedicated agent on demand (its model, provider, prompt)
If neither is set, behavior depends on the feature: Compaction falls back to prune-only, Title falls back to first user message, others return an error.
Agents are created on demand — no caching. agent.New() runs each time the feature fires (during spinner display, off the critical path).
Code: internal/assistant/resolve.go.
Which Features Support Which Pattern
| Feature | agent: | prompt: | Resolution | Notes |
|---|---|---|---|---|
| Compaction | Yes | Yes | ResolveAgent() | Falls back to prune-only if neither configured |
| Thinking | Yes | Yes | ResolveAgent() | Only runs when thinking blocks cross keep_last boundary |
| Guardrail | Yes | Yes | ResolveAgent() | Per-scope: scope.tool_calls.agent/prompt, scope.user_messages.agent/prompt |
| Title | Yes | Yes | ResolveAgent() | Falls back to first user message if neither configured |
| Vision, STT, TTS, Embeddings, Reranker | Yes | No | Config-only | Agent name provides model/provider — no runtime resolution |
Compaction, Thinking, Guardrail, and Title use the ResolveAgent() framework with prompt/agent duality. Vision, STT, TTS, Embeddings, and Reranker store an agent name in config for model/provider reference but have no runtime resolution logic — they are configured, not resolved.
When to Use Which
For features that support both patterns (Compaction, Thinking, Guardrail, Title):
- Use
agent:when you want a small local model (e.g., Ollama) for cheap internal tasks while using a cloud model for conversation. - Use
prompt:when running on a cloud provider and you want to avoid extra API calls — the current model handles the feature in-context. - Use
prompt:with Ollama when your local model is powerful enough to handle both conversation and internal tasks.
Compaction
File: features/compaction.yaml
Automatically compresses conversation history when the context window fills up.
compaction:
# Context fill percentage that triggers auto-compaction (1-100).
threshold: 80
# Absolute token count that triggers compaction (overrides threshold when set).
# max_tokens: 32000
# Fill percentage at which duplicate synthetic messages are removed.
# Delays compaction by trimming injected messages first. Must be < threshold.
trim_threshold: 50
# Absolute token count that triggers trimming (overrides trim_threshold when set).
# trim_max_tokens: 16000
# Number of most recent messages to preserve during compaction.
keep_last_messages: 10
# Number of chunks for sequential compaction.
# 1 = single-pass. N > 1 splits history into N chunks,
# compacts each sequentially (each chunk's summary feeds into the next).
chunks: 1
# Agent for generating the compaction summary.
agent: Compaction
# Named prompt for self-compaction (overrides agent).
# When set, uses the current agent's model with this prompt instead
# of delegating to a separate agent. Mirrors thinking's self-rewrite pattern.
# prompt: ""
# Max characters for tool results in the compaction transcript.
tool_result_max_length: 200
# Retry sequence with progressively shorter tool results on failure.
truncation_retries: [150, 100, 50, 0]
# Tool result pruning to keep context lean.
prune:
mode: "off" # "off", "iteration", "compaction"
protect_percent: 30 # % of context to protect from pruning
arg_threshold: 200 # min tokens for tool call args to be prunable
Embeddings
File: features/embeddings.yaml
Embedding-based codebase search with optional reranking.
embeddings:
# Agent for generating text embeddings.
agent: "Embeddings"
# Default number of search results.
max_results: 5
# Gitignore-style patterns controlling which files get indexed.
# Uses standard gitignore syntax.
gitignore: |
*
!*/
!src/**/*.go
# Explicitly load/unload the embedding model between pipeline stages.
# Frees VRAM after embedding so reranking can load its own model.
# Useful on VRAM-constrained devices. No-op if provider doesn't support it.
offload: false
chunking:
# How files are split for embedding: auto, ast, line.
# auto = AST for supported languages (tree-sitter), line for everything else.
strategy: "auto"
# Target maximum tokens per chunk.
max_tokens: 500
# Overlap tokens between adjacent chunks.
overlap_tokens: 75
reranking:
# Agent for reranking results after similarity search.
# Set to "" to skip reranking.
agent: "Reranker"
# Fetch max_results * multiplier candidates, then rerank down.
multiplier: 4
# Explicitly load/unload the reranker model around reranking.
# Frees VRAM after reranking completes.
offload: false
Thinking
File: features/thinking.yaml
Extended thinking/reasoning block management.
thinking:
# Agent for rewriting thinking blocks.
# Only used when an agent has thinking: rewrite in its frontmatter.
agent: "Thinking"
# Named prompt for self-rewrite (overrides agent).
# When set, uses the current agent's model with this prompt instead
# of delegating to a separate agent. Mirrors compaction's self-compact pattern.
# prompt: ""
# Number of most recent messages whose thinking blocks are always preserved.
keep_last: 5
# Minimum token count for a thinking block to be affected by strip/rewrite.
# Blocks below this threshold are left alone (small thinking blocks are cheap to keep).
token_threshold: 300
Thinking handling is set per-agent in agent frontmatter:
thinking: ""— keep thinking blocks as-is (default)thinking: "strip"— strip thinking from older messages that exceed the token threshold, honoringkeep_lastthinking: "rewrite"— condense older thinking via a dedicated agent or self-rewrite
Agent resolution follows a three-tier pattern (same as compaction):
prompt:set → self-rewrite with current modelagent:set → use dedicated Thinking agent- Neither → use pre-built agent (error if not configured)
Vision
File: features/vision.yaml
LLM-based image analysis and text extraction.
vision:
# Agent for vision calls (must support vision).
agent: "Vision"
# Max pixel dimension for image compression.
dimension: 1024
# JPEG compression quality (1-100).
quality: 75
Speech-to-Text (STT)
File: features/stt.yaml
Audio transcription via a whisper-compatible server.
stt:
# Agent for transcription calls (provider must support /v1/audio/transcriptions).
agent: "Transcribe"
# ISO-639-1 language hint (e.g. "en", "de", "ja"). Empty = auto-detect.
language: ""
Text-to-Speech (TTS)
File: features/tts.yaml
Speech synthesis via an OpenAI-compatible TTS server.
tts:
# Agent for synthesis calls (provider must support /v1/audio/speech).
agent: "Speak"
# Default voice identifier (depends on TTS server — e.g. "alloy" for OpenAI, "af_heart" for Kokoro).
voice: "alloy"
# Default output audio format: mp3, opus, aac, flac, wav, pcm.
format: "mp3"
# Default playback speed (0.25-4.0).
speed: 1.0
Title
File: features/title.yaml
Automatic session title generation.
title:
# Set to true to skip LLM title generation.
disabled: false
# Agent for title generation (dedicated model/provider).
agent: "Title"
# Named prompt for self-title (overrides agent — uses current model).
prompt: ""
# Maximum character length for titles.
max_length: 50
Title uses the same ResolveAgent() framework as Compaction and Thinking: prompt: uses the current model with a named system prompt, agent: creates a dedicated agent on demand. If neither is set, falls back to using the first user message as the title (no LLM call).
Tools
File: features/tools.yaml
Tool execution guards and limits.
tools:
# Guard mode: "tokens" (fixed limit) or "percentage" (context-fill based).
mode: percentage
# Tool result size guard.
result:
# Max estimated tokens for a single tool result (mode: tokens).
max_tokens: 20000
# Max context-fill percentage after adding result (mode: percentage).
max_percentage: 95
# Token threshold below which Read tool returns full file.
read_small_file_tokens: 2000
# Maximum total iterations before hard stop.
max_steps: 50
# Cumulative token limit (input + output). Once reached, the assistant stops.
# 0 = disabled (default). Override per-run with --token-budget.
token_budget: 0
# Message returned when a tool result is rejected.
rejection_message: >-
Error: Tool result too large (%d tokens, limit %d).
Try a more specific query or use offset/limit parameters.
# Bash tool configuration.
bash:
# Output truncation.
# Byte cap: stdout/stderr each capped at max_output_bytes during capture (prevents OOM).
# Line truncation: output exceeding max_lines is middle-truncated (first head + last tail lines).
# Full output is saved to a temp file referenced in the truncation message.
truncation:
max_output_bytes: 1048576 # 1MB per stream; 0 = disabled
max_lines: 200
head_lines: 100
tail_lines: 80
# Go text/template applied to every Bash tool command before execution.
# Receives {{ .Command }} (original command) and sprig functions.
# Empty = no rewrite.
rewrite: ""
# Max response body size in bytes for WebFetch tool (default: 5MB).
webfetch_max_body_size: 5242880
# Enable parallel execution of independent tool calls within a single LLM turn.
# Omit or set true = parallel (default). false = sequential.
# parallel: false
# Max context-fill percentage after adding a user message.
# Prevents massive input from causing unrecoverable context exhaustion.
# 0 = disabled.
user_input_max_percentage: 80
# Read-before enforcement policy. Toggle at runtime with /readbefore or /rb.
read_before:
write: true # require read before overwriting existing files (default: true)
delete: false # require read before deleting files (default: false)
# Default tool include patterns (glob). Empty = all tools.
# Equivalent to --include-tools CLI flag. CLI flags override this.
enabled: []
# Default tool exclude patterns (glob). Empty = none excluded.
# Equivalent to --exclude-tools CLI flag. CLI flags override this.
disabled: ["mcp__*"]
# Tools hidden unless explicitly named in an enabled list.
# The bare "*" wildcard does NOT satisfy opt-in — only named references do.
opt_in: []
# Glob patterns for tools to defer (e.g., ["Vision*", "Speak", "mcp__github*"]).
# Deferred tools are excluded from request.Tools and instead listed in a
# lightweight prompt block. The LoadTools meta-tool loads them on demand,
# and once loaded they persist for the remainder of the session.
deferred: []
Sandbox
File: features/sandbox.yaml
Filesystem sandbox via Landlock, restricting the tools the LLM can access.
sandbox:
# Global enforcement toggle.
enabled: true
# Filesystem restriction lists (support $HOME expansion).
restrictions:
ro: [] # read-only paths
rw: [] # read-write paths
# Additional paths additive on top of restrictions.
# Merged from agent and mode frontmatter features.sandbox.extra.
# extra:
# ro: []
# rw: []
Toggle at runtime with /sandbox or /landlock.
The sandbox has two runtime states: requested (what the user/config wants) and enabled (whether Landlock is actually enforcing). These can differ when the kernel does not support Landlock — requested is true but enabled is false. Both states are available in prompt templates as {{ .Sandbox.Requested }} and {{ .Sandbox.Enabled }}.
Subagent
File: features/subagent.yaml
Configuration for the subagent runner — a one-shot mini tool loop with isolated conversation context used by the Task tool.
subagent:
# Maximum tool-use iterations for a single subagent run.
max_steps: 25
# Default agent used when no agent is specified in the Task tool call.
# Empty = use the parent (calling) agent.
default_agent: ""
Feature resolution: All subagents resolve their own features independently — the merge chain is global → child agent → child mode, not the parent’s merged features. This means a subagent with features.subagent.max_steps: 50 uses that value regardless of what the parent agent sets. When no agent name is specified, it resolves to default_agent, then falls back to the parent’s agent name — but always creates a fresh, isolated instance.
Plugins
File: features/plugins.yaml
Configuration for Go plugin loading.
plugins:
# Plugin directory relative to config home. Empty = "plugins/".
dir: ""
# Allow loading plugins compiled without safety checks.
# Set to true only if you trust all plugins in .aura/plugins/.
unsafe: false
# Load only matching plugin names. Empty = load all.
include: []
# Skip matching plugin names. Applied after include.
exclude: []
# Plugin configuration values (3-layer merge: plugin.yaml → global → local).
# Merge order: plugin.yaml defaults (lowest) → global → local (highest).
config:
# Key-value pairs sent to ALL plugins via sdk.Context.PluginConfig.
global: {}
# Per-plugin overrides keyed by plugin name.
local: {}
# local:
# failure-circuit-breaker:
# max_failures: 5
# todo-reminder:
# interval: 10
# session-stats:
# interval: 10
# top_tools: 5
MCP
File: features/mcp.yaml
Server-level filtering for MCP connections. Same glob pattern system as tool filtering.
mcp:
# Default include patterns (glob). Empty = connect all enabled servers.
# Equivalent to --include-mcps CLI flag. CLI flags override this.
enabled: []
# Default exclude patterns (glob). Empty = skip none.
# Equivalent to --exclude-mcps CLI flag. CLI flags override this.
disabled: []
CLI flags: --include-mcps "context7,git*", --exclude-mcps "portainer".
Guardrail
File: features/guardrail.yaml
Secondary LLM validation of tool calls and user messages. Disabled by default. Each scope (tool_calls, user_messages) is independently configurable with its own agent or prompt.
guardrail:
# "block" (reject), "log" (notice + proceed), "" (disabled, default).
mode: ""
# Error policy — what happens when the guardrail check itself fails
# (timeout, network error, model unavailable).
# "block" = fail-closed (default when mode: block).
# "allow" = fail-open (default when mode: log).
# Omit to inherit from mode.
# on_error: ""
# Max duration per check. Default: 2m when enabled.
timeout: 2m
# Independent scopes — a scope is active when agent or prompt is set.
scope:
tool_calls:
agent: "" # dedicated guardrail agent name
prompt: "" # OR: named prompt for self-guardrail (uses current model)
user_messages:
agent: "" # dedicated guardrail agent name
prompt: "" # OR: named prompt for self-guardrail (uses current model)
# Filter which tools trigger guardrail checks (tool_calls scope only).
tools:
enabled: [] # glob patterns — only check matching tools (empty = all)
disabled: [] # glob patterns — skip matching tools (applied after enabled)
See Guardrail feature for detailed behavior, response protocol, and error handling.
Estimation
File: features/estimation.yaml
Token estimation method used for context-fill calculations and tool result guards.
estimation:
# Estimation algorithm: "rough", "tiktoken", "rough+tiktoken", "native".
# rough = chars / divisor
# tiktoken = tiktoken encoding
# rough+tiktoken = max of both (default)
# native = provider tokenization (Anthropic, OpenAI, Google, Ollama, LlamaCPP)
method: "rough+tiktoken"
# Chars-per-token divisor for rough estimation.
divisor: 4
# Tiktoken encoding name.
encoding: "cl100k_base"
Override Precedence
Feature values can be overridden at multiple levels. Each level merges non-zero values on top of the previous:
- Global defaults —
features/*.yamlfiles (+ApplyDefaults()for omitted fields) - CLI flags —
--max-stepsoverridestools.max_stepsbefore agent resolution - Agent frontmatter —
features:block in agent.mdfiles (see Agents) - Mode frontmatter —
features:block in mode.mdfiles (see Modes) - Task definition —
features:block in task YAML (see Tasks)
Only non-zero values override — omitting a field preserves the value from the previous level. This means an agent can override compaction.threshold without affecting compaction.keep_last_messages.