Providers

Providers are YAML files in .aura/config/providers/. Each file configures a connection to an LLM backend.

Provider Types

Type	Protocol	Capabilities
`ollama`	Native Ollama API	Chat, embedding, thinking, vision
`llamacpp`	OpenAI-compatible	Chat, reranking, thinking, vision; also used for whisper (STT) and kokoro (TTS)
`openrouter`	OpenRouter API	Chat, embedding (cloud models, token auth)
`openai`	OpenAI Responses API	Chat, embedding, transcription, synthesis
`anthropic`	Native Anthropic Messages API	Chat, thinking, vision, tools
`google`	Native Gemini API	Chat, thinking, vision, tools, embedding
`copilot`	GitHub Copilot (dual-protocol)	Chat (GPT via Responses API, Claude via Messages API)
`codex`	OpenAI Plus (Responses API)	Chat (ChatGPT Plus/Pro subscription)

All providers implement a 4-method core interface (Chat, Models, Model, Estimate). Optional capabilities (Embed, Rerank, Transcribe, Synthesize, ModelLoader) use opt-in interfaces — providers implement only what they support, and callers discover capabilities via providers.As[T](provider).

YAML Schema

provider_name:
  # API endpoint URL. Required for most providers, optional for anthropic.
  url: http://host.docker.internal:11434

  # Provider type — determines which API protocol to use (required).
  # Values: ollama, llamacpp, openrouter, openai, anthropic, google, copilot, codex.
  type: ollama

  # Authentication token.
  # Falls back to AURA_PROVIDERS_{NAME}_TOKEN environment variable.
  # token: ""

  # How long models stay loaded in VRAM (Ollama only).
  # Uses Go duration syntax: "5m", "1h", "30s".
  # keep_alive: 15m

  # HTTP response header timeout — how long to wait for the server to start responding.
  # Does not affect streaming responses. Default: 5m.
  # timeout: 5m

  # Model visibility filter for `aura models` and `/model`.
  # Does NOT affect --model flag or feature agents.
  models:
    include: []    # Glob patterns — empty means all (e.g., ["llama*", "qwen*"])
    exclude: []    # Glob patterns — applied after include (e.g., ["*embed*"])

  # Declared capabilities for this provider. Empty = all capabilities assumed.
  # Values: chat, embed, rerank, transcribe, synthesize.
  # Use to restrict a provider to specific roles (e.g., embed-only provider).
  # capabilities: []

  # Retry configuration for transient Chat() failures (rate limits, 5xx, network errors).
  # Only applies to native providers (ollama, llamacpp) — Fantasy-based providers
  # (anthropic, openai, google, openrouter, copilot, codex) have built-in retry.
  # Disabled by default (max_attempts: 0).
  retry:
    max_attempts: 0   # 0 = disabled. Number of retry attempts before giving up.
    base_delay: 1s    # Initial delay between retries (exponential backoff).
    max_delay: 30s    # Maximum delay between retries.

Token Resolution

Authentication tokens are resolved in order:

token field in the provider YAML — supports $VAR and ${VAR} env var expansion
AURA_PROVIDERS_{NAME}_TOKEN environment variable (automatic fallback when token is empty)
Value from env file loaded via --env-file

This means you can reference existing environment variables directly:

openrouter:
  type: openrouter
  url: https://openrouter.ai/api/v1
  token: ${OPENROUTER_API_KEY}

anthropic:
  type: anthropic
  token: ${ANTHROPIC_API_KEY}

Example

my_ollama:
  url: http://host.docker.internal:11434
  type: ollama
  keep_alive: 15m
  models:
    exclude: ["*embed*"]

OpenAI-Compatible Example

The openai type works with any service that implements the OpenAI Responses API (/v1/responses):

openai:
  url: https://api.openai.com/v1
  type: openai
  # token from AURA_PROVIDERS_OPENAI_TOKEN env var
  timeout: 5m

# groq:
#   url: https://api.groq.com/openai/v1
#   type: openai

# deepseek:
#   url: https://api.deepseek.com/v1
#   type: openai

Anthropic Example

The anthropic type uses the native Anthropic Messages API with full streaming, tool use, thinking, and vision support:

anthropic:
  type: anthropic
  # token from AURA_PROVIDERS_ANTHROPIC_TOKEN env var
  timeout: 5m

URL is optional — defaults to https://api.anthropic.com. Token uses the X-Api-Key header (handled by the SDK).

Google Gemini Example

The google type uses the native Gemini API with streaming, tool use, thinking (with signature preservation), vision, and embeddings:

google:
  type: google
  # token from AURA_PROVIDERS_GOOGLE_TOKEN env var
  timeout: 5m

URL is optional — defaults to Google’s Gemini API endpoint. Token uses the ?key= query parameter (handled by the SDK).

GitHub Copilot Example

The copilot type routes requests to either the OpenAI Responses API (GPT models) or Anthropic Messages API (Claude models) through your GitHub Copilot subscription:

copilot:
  type: copilot
  timeout: 5m

Authenticate via aura login copilot (device code flow) or set AURA_PROVIDERS_COPILOT_TOKEN to a GitHub OAuth token (ghu_...).

OpenAI Plus (Codex) Example

The codex type accesses ChatGPT Plus/Pro models via the Codex endpoint:

codex:
  type: codex
  timeout: 5m

Authenticate via aura login codex (device code flow) or set AURA_PROVIDERS_CODEX_TOKEN to an OpenAI refresh token (rt_...).

Catwalk Registry

Model capabilities (context length, vision, thinking, thinking levels) are enriched at startup using Catwalk metadata. Aura ships with compiled-in embedded data that works offline instantly. On startup, it also attempts a live fetch from the Catwalk API and caches the result to disk. If the fetch fails, the disk cache is tried, then the embedded data.

Enrichment only fills gaps — it never overwrites capabilities already reported by the provider API. Currently applies to the anthropic, openai, and google providers. Anthropic and OpenAI listing APIs return only model IDs (no capabilities), so they get full enrichment. Google returns context length, thinking, and tools from its API, so only Vision is filled from Catwalk. The remaining providers (Ollama, OpenRouter, LlamaCPP, Copilot, Codex) build all capabilities inline from their rich API responses.

You can add as many providers as you need — create new YAML files in providers/. Any provider type can be used with custom URLs (e.g., a remote Ollama instance, a self-hosted OpenAI-compatible server).

To see the default providers that ship with aura init, inspect .aura/config/providers/ after scaffolding.