Embeddings
Embedding-based codebase search with optional reranking. Available through three interfaces:
- Query tool — LLM invokes it during conversations
/querycommand — run from the slash command promptaura queryCLI — standalone command-line search

Pipeline
Files → Indexer → Chunker → Embedder → Vector DB → Query → Reranker → Results
- Indexer — Concurrent file walking (fastwalk, 20 workers) with SHA256 hash-based change detection. Only re-embeds files that have changed.
- Chunker — Splits files into token-aware chunks with configurable overlap.
- Embedder — Generates vector embeddings via the configured LLM provider.
- Vector DB — chromem-go persistent vector database stored at
.aura/embeddings/. Collections are keyed by embedding model name — changing the model automatically triggers a full re-index and cleans up stale collections. - Reranker — Optional separate agent/model for reranking similarity results.
Chunking Strategies
| Strategy | Description |
|---|---|
| auto (default) | AST parsing for supported languages (tree-sitter), line-based for everything else |
| ast | Always use tree-sitter AST parsing. Falls back to line-based if unsupported. |
| line | Always use line-based splitting |
AST chunking extracts function, method, and type declarations as individual chunks, preserving semantic boundaries.
Supported languages: Go, Python, TypeScript, TSX, JavaScript, JSX, Rust, Java, C (.c, .h), C++ (.cc, .cpp, .hpp).
Defaults
| Setting | Default |
|---|---|
| Strategy | auto |
| Max tokens per chunk | 500 |
| Overlap tokens | 75 |
| Token estimation | len(text) / 4 |
File Filtering
Indexed files are controlled by gitignore patterns in the config. These are combined with any .gitignore at the project root.
gitignore: |
*
!*/
!internal/
!internal/**/*.go
Reranking
When a reranker agent is configured, the search fetches max_results * multiplier candidates from the vector DB, then reranks down to max_results using a dedicated model. Set agent: "" to skip reranking.
Model Offloading
On VRAM-constrained devices (e.g., Jetson), the embedding and reranking models may not fit in memory simultaneously. The offload: flag explicitly loads/unloads models between pipeline stages:
- Embedding offload — preloads the embedding model before indexing, unloads it before reranking
- Reranking offload — loads the reranker model before reranking, unloads it after
Set offload: true at the top level for embeddings, under reranking: for the reranker, or both. No-op if the provider doesn’t support model lifecycle control (only Ollama currently does).
Configuration
See Embeddings Config for the full YAML schema.
Query Tool Parameters
The Query tool accepts these parameters when invoked by the LLM:
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | (required) | Search query for embedding-based similarity |
k | int | from config | Number of results to return |
full_content | bool | false | Return chunk content in results |
reranking | bool | true | Include reranking pass (set false to skip even if reranker is configured) |
CLI Usage
# Search for relevant code
aura query "token counting"
# Return more results
aura query -k 10 "configuration loading"
# Reindex only (no search)
aura query