Embeddings

Embedding-based codebase search with optional reranking. Available through three interfaces:

  • Query tool — LLM invokes it during conversations
  • /query command — run from the slash command prompt
  • aura query CLI — standalone command-line search

Embeddings

Pipeline

Files → Indexer → Chunker → Embedder → Vector DB → Query → Reranker → Results
  1. Indexer — Concurrent file walking (fastwalk, 20 workers) with SHA256 hash-based change detection. Only re-embeds files that have changed.
  2. Chunker — Splits files into token-aware chunks with configurable overlap.
  3. Embedder — Generates vector embeddings via the configured LLM provider.
  4. Vector DB — chromem-go persistent vector database stored at .aura/embeddings/. Collections are keyed by embedding model name — changing the model automatically triggers a full re-index and cleans up stale collections.
  5. Reranker — Optional separate agent/model for reranking similarity results.

Chunking Strategies

Strategy Description
auto (default) AST parsing for supported languages (tree-sitter), line-based for everything else
ast Always use tree-sitter AST parsing. Falls back to line-based if unsupported.
line Always use line-based splitting

AST chunking extracts function, method, and type declarations as individual chunks, preserving semantic boundaries.

Supported languages: Go, Python, TypeScript, TSX, JavaScript, JSX, Rust, Java, C (.c, .h), C++ (.cc, .cpp, .hpp).

Defaults

Setting Default
Strategy auto
Max tokens per chunk 500
Overlap tokens 75
Token estimation len(text) / 4

File Filtering

Indexed files are controlled by gitignore patterns in the config. These are combined with any .gitignore at the project root.

gitignore: |
  *
  !*/
  !internal/
  !internal/**/*.go

Reranking

When a reranker agent is configured, the search fetches max_results * multiplier candidates from the vector DB, then reranks down to max_results using a dedicated model. Set agent: "" to skip reranking.

Model Offloading

On VRAM-constrained devices (e.g., Jetson), the embedding and reranking models may not fit in memory simultaneously. The offload: flag explicitly loads/unloads models between pipeline stages:

  • Embedding offload — preloads the embedding model before indexing, unloads it before reranking
  • Reranking offload — loads the reranker model before reranking, unloads it after

Set offload: true at the top level for embeddings, under reranking: for the reranker, or both. No-op if the provider doesn’t support model lifecycle control (only Ollama currently does).

Configuration

See Embeddings Config for the full YAML schema.

Query Tool Parameters

The Query tool accepts these parameters when invoked by the LLM:

Parameter Type Default Description
query string (required) Search query for embedding-based similarity
k int from config Number of results to return
full_content bool false Return chunk content in results
reranking bool true Include reranking pass (set false to skip even if reranker is configured)

CLI Usage

# Search for relevant code
aura query "token counting"

# Return more results
aura query -k 10 "configuration loading"

# Reindex only (no search)
aura query

Back to top

Copyright © 2026 idelchi. Distributed under the MIT License.