Provider Management

This page covers provider catalog loading, model metadata, aliases, routing, spend controls, fallback behavior, REST endpoints, channel commands, and security notes.

Included Topics

Dynamic Provider Loading
Model Catalog
Model Aliases
Per-Agent Model Override
Model Routing
Cost Tracking
Fallback Providers
API Endpoints
Channel Commands
Environment Variables Summary
Security Notes

Official provider definitions are pre-installed to ~/.librefang/providers/ from the registry. To add or override providers, place custom .toml files in the same directory. Each file defines one provider:

# ~/.librefang/providers/my-endpoint.toml
id = "my-endpoint"
display_name = "My Private Endpoint"
driver = "openai_compatible"
base_url = "https://llm.internal.company.com/v1"
api_key_env = "MY_ENDPOINT_KEY"
key_required = true

[[models]]
id = "my-model-7b"
display_name = "My Model 7B"
tier = "Balanced"
context_window = 32768
max_output_tokens = 4096
input_cost_per_m = 0.0
output_cost_per_m = 0.0
supports_tools = true
supports_vision = false

Files in ~/.librefang/providers/ are loaded at startup and merged into the catalog alongside the builtin providers.

Custom Image-Generation Providers

A media driver name not in the builtin list (openai, gemini, elevenlabs, minimax, google_tts) used to fail with "Unknown media provider". Custom providers with a configured URL now fall through to a generic OpenAI-compatible image driver:

# config.toml
[provider_urls]
volcengine = "https://open.volcengineapi.com/v1"
tongyi     = "https://dashscope.aliyuncs.com/compatible-mode/v1"

export VOLCENGINE_API_KEY="sk-..."
export TONGYI_API_KEY="sk-..."

The driver expects standard OpenAI Images API request/response shape (/images/generations endpoint, b64_json or url returned). Set [media] image_provider = "volcengine" (or whichever id you used in [provider_urls]) to route image-generation calls there.

Driver-side File Upload

The Moonshot (Kimi) driver uploads attachments to /v1/files before forwarding the request to /v1/chat/completions, so messages with image / PDF / text attachments work end-to-end without you wiring per-attachment marshalling. Files are written to std::env::temp_dir(), content-type validated, size-limited, and filename-sanitised before upload. This is transparent — no extra config; just attach files via POST /api/agents/{id}/upload or the dashboard chat as usual.

Provider Regions

Some providers offer region-specific endpoints. Regions are defined in registry TOML files with an optional api_key_env override:

# In a provider's registry TOML:
[provider.regions.intl]
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

[provider.regions.china]
base_url = "https://api.minimaxi.com/v1"
api_key_env = "MINIMAX_CN_API_KEY"    # Optional: override the default API key env var

Select a region in config.toml:

[provider_regions]
qwen = "intl"
minimax = "china"

Priority: Region selections are applied before explicit [provider_urls] entries. If both are set for the same provider, provider_urls wins.

Model Catalog

The complete catalog of all 230+ builtin models, sorted by provider. Pricing is per million tokens.

#	Model ID	Display Name	Provider	Tier	Context Window	Max Output	Input $/M	Output $/M	Tools	Vision
1	`claude-opus-4-20250514`	Claude Opus 4	anthropic	Frontier	200,000	32,000	$15.00	$75.00	Yes	Yes
2	`claude-sonnet-4-20250514`	Claude Sonnet 4	anthropic	Smart	200,000	64,000	$3.00	$15.00	Yes	Yes
3	`claude-haiku-4-5-20251001`	Claude Haiku 4.5	anthropic	Fast	200,000	8,192	$0.25	$1.25	Yes	Yes
4	`gpt-4.1`	GPT-4.1	openai	Frontier	1,047,576	32,768	$2.00	$8.00	Yes	Yes
5	`gpt-4o`	GPT-4o	openai	Smart	128,000	16,384	$2.50	$10.00	Yes	Yes
6	`o3-mini`	o3-mini	openai	Smart	200,000	100,000	$1.10	$4.40	Yes	No
7	`gpt-4.1-mini`	GPT-4.1 Mini	openai	Balanced	1,047,576	32,768	$0.40	$1.60	Yes	Yes
8	`gpt-4o-mini`	GPT-4o Mini	openai	Fast	128,000	16,384	$0.15	$0.60	Yes	Yes
9	`gpt-4.1-nano`	GPT-4.1 Nano	openai	Fast	1,047,576	32,768	$0.10	$0.40	Yes	No
10	`gemini-2.5-pro`	Gemini 2.5 Pro	gemini	Frontier	1,048,576	65,536	$1.25	$10.00	Yes	Yes
11	`gemini-2.5-flash`	Gemini 2.5 Flash	gemini	Smart	1,048,576	65,536	$0.15	$0.60	Yes	Yes
12	`gemini-2.0-flash`	Gemini 2.0 Flash	gemini	Fast	1,048,576	8,192	$0.10	$0.40	Yes	Yes
13	`deepseek-chat`	DeepSeek V3	deepseek	Smart	64,000	8,192	$0.27	$1.10	Yes	No
14	`deepseek-reasoner`	DeepSeek R1	deepseek	Smart	64,000	8,192	$0.55	$2.19	No	No
15	`llama-3.3-70b-versatile`	Llama 3.3 70B	groq	Balanced	128,000	32,768	$0.059	$0.079	Yes	No
16	`mixtral-8x7b-32768`	Mixtral 8x7B	groq	Balanced	32,768	4,096	$0.024	$0.024	Yes	No
17	`llama-3.1-8b-instant`	Llama 3.1 8B	groq	Fast	128,000	8,192	$0.05	$0.08	Yes	No
18	`gemma2-9b-it`	Gemma 2 9B	groq	Fast	8,192	4,096	$0.02	$0.02	No	No
19	`openrouter/google/gemini-2.5-flash`	Gemini 2.5 Flash (OpenRouter)	openrouter	Smart	1,048,576	65,536	$0.15	$0.60	Yes	Yes
20	`openrouter/anthropic/claude-sonnet-4`	Claude Sonnet 4 (OpenRouter)	openrouter	Smart	200,000	64,000	$3.00	$15.00	Yes	Yes
21	`openrouter/openai/gpt-4o`	GPT-4o (OpenRouter)	openrouter	Smart	128,000	16,384	$2.50	$10.00	Yes	Yes
22	`openrouter/deepseek/deepseek-chat`	DeepSeek V3 (OpenRouter)	openrouter	Smart	128,000	32,768	$0.14	$0.28	Yes	No
23	`openrouter/meta-llama/llama-3.3-70b-instruct`	Llama 3.3 70B (OpenRouter)	openrouter	Balanced	128,000	32,768	$0.39	$0.39	Yes	No
24	`openrouter/qwen/qwen-2.5-72b-instruct`	Qwen 2.5 72B (OpenRouter)	openrouter	Balanced	128,000	32,768	$0.36	$0.36	Yes	No
25	`openrouter/google/gemini-2.5-pro`	Gemini 2.5 Pro (OpenRouter)	openrouter	Frontier	1,048,576	65,536	$1.25	$10.00	Yes	Yes
26	`openrouter/mistralai/mistral-large-latest`	Mistral Large (OpenRouter)	openrouter	Smart	128,000	8,192	$2.00	$6.00	Yes	No
27	`openrouter/google/gemma-2-9b-it`	Gemma 2 9B (OpenRouter)	openrouter	Fast	8,192	4,096	$0.00	$0.00	No	No
28	`openrouter/deepseek/deepseek-r1`	DeepSeek R1 (OpenRouter)	openrouter	Frontier	128,000	32,768	$0.55	$2.19	No	No
29	`mistral-large-latest`	Mistral Large	mistral	Smart	128,000	8,192	$2.00	$6.00	Yes	No
30	`codestral-latest`	Codestral	mistral	Smart	32,000	8,192	$0.30	$0.90	Yes	No
31	`mistral-small-latest`	Mistral Small	mistral	Fast	128,000	8,192	$0.10	$0.30	Yes	No
32	`meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`	Llama 3.1 405B (Together)	together	Frontier	130,000	4,096	$3.50	$3.50	Yes	No
33	`Qwen/Qwen2.5-72B-Instruct-Turbo`	Qwen 2.5 72B (Together)	together	Smart	32,768	4,096	$0.20	$0.60	Yes	No
34	`mistralai/Mixtral-8x22B-Instruct-v0.1`	Mixtral 8x22B (Together)	together	Balanced	65,536	4,096	$0.60	$0.60	Yes	No
35	`accounts/fireworks/models/llama-v3p1-405b-instruct`	Llama 3.1 405B (Fireworks)	fireworks	Frontier	131,072	16,384	$3.00	$3.00	Yes	No
36	`accounts/fireworks/models/mixtral-8x22b-instruct`	Mixtral 8x22B (Fireworks)	fireworks	Balanced	65,536	4,096	$0.90	$0.90	Yes	No
37	`llama3.2`	Llama 3.2 (Ollama)	ollama	Local	128,000	4,096	$0.00	$0.00	Yes	No
38	`mistral:latest`	Mistral (Ollama)	ollama	Local	32,768	4,096	$0.00	$0.00	Yes	No
39	`phi3`	Phi-3 (Ollama)	ollama	Local	128,000	4,096	$0.00	$0.00	No	No
40	`vllm-local`	vLLM Local Model	vllm	Local	32,768	4,096	$0.00	$0.00	Yes	No
41	`lmstudio-local`	LM Studio Local Model	lmstudio	Local	32,768	4,096	$0.00	$0.00	Yes	No
42	`sonar-pro`	Sonar Pro	perplexity	Smart	200,000	8,192	$3.00	$15.00	No	No
43	`sonar`	Sonar	perplexity	Balanced	128,000	8,192	$1.00	$5.00	No	No
44	`command-r-plus`	Command R+	cohere	Smart	128,000	4,096	$2.50	$10.00	Yes	No
45	`command-r`	Command R	cohere	Balanced	128,000	4,096	$0.15	$0.60	Yes	No
46	`jamba-1.5-large`	Jamba 1.5 Large	ai21	Smart	256,000	4,096	$2.00	$8.00	Yes	No
47	`cerebras/llama3.3-70b`	Llama 3.3 70B (Cerebras)	cerebras	Balanced	128,000	8,192	$0.06	$0.06	Yes	No
48	`cerebras/llama3.1-8b`	Llama 3.1 8B (Cerebras)	cerebras	Fast	128,000	8,192	$0.01	$0.01	Yes	No
49	`sambanova/llama-3.3-70b`	Llama 3.3 70B (SambaNova)	sambanova	Balanced	128,000	8,192	$0.06	$0.06	Yes	No
50	`grok-2`	Grok 2	xai	Smart	131,072	32,768	$2.00	$10.00	Yes	Yes
51	`grok-2-mini`	Grok 2 Mini	xai	Fast	131,072	32,768	$0.30	$0.50	Yes	No
52	`hf/meta-llama/Llama-3.3-70B-Instruct`	Llama 3.3 70B (HF)	huggingface	Balanced	128,000	4,096	$0.30	$0.30	No	No
53	`replicate/meta-llama-3.3-70b-instruct`	Llama 3.3 70B (Replicate)	replicate	Balanced	128,000	4,096	$0.40	$0.40	No	No

Model Tiers:

Tier	Description	Typical Use
Frontier	Most capable, highest cost	Orchestration, architecture, security audits
Smart	Strong reasoning, moderate cost	Coding, code review, research, analysis
Balanced	Good cost/quality tradeoff	Planning, writing, DevOps, day-to-day tasks
Fast	Cheapest cloud inference	Ops, translation, simple Q&A, health checks
Local	Self-hosted, zero cost	Privacy-first, offline, development

Notes:

Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with Local tier and zero cost.
The entries above are a representative subset of the 230+ builtin models. The full catalog includes additional models per provider and runtime auto-discovered models that vary per installation.

Model Aliases

All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.

Alias	Resolves To
`sonnet`	`claude-sonnet-4-20250514`
`claude-sonnet`	`claude-sonnet-4-20250514`
`haiku`	`claude-haiku-4-5-20251001`
`claude-haiku`	`claude-haiku-4-5-20251001`
`opus`	`claude-opus-4-20250514`
`claude-opus`	`claude-opus-4-20250514`
`gpt4`	`gpt-4o`
`gpt4o`	`gpt-4o`
`gpt4-mini`	`gpt-4o-mini`
`flash`	`gemini-2.5-flash`
`gemini-flash`	`gemini-2.5-flash`
`gemini-pro`	`gemini-2.5-pro`
`deepseek`	`deepseek-chat`
`llama`	`llama-3.3-70b-versatile`
`llama-70b`	`llama-3.3-70b-versatile`
`mixtral`	`mixtral-8x7b-32768`
`mistral`	`mistral-large-latest`
`codestral`	`codestral-latest`
`grok`	`grok-2`
`grok-mini`	`grok-2-mini`
`sonar`	`sonar-pro`
`jamba`	`jamba-1.5-large`
`command-r`	`command-r-plus`

You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.

Per-Agent Model Override

Each agent in your config.toml can specify its own model, overriding the global default:

# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"

# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus"                      # alias for claude-opus-4-20250514

[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile"   # cheap Groq model for simple ops

[[agents]]
name = "coder"
model = "gemini-2.5-flash"          # fast + cheap + 1M context

[[agents]]
name = "researcher"
model = "sonar-pro"                 # Perplexity with built-in web search

# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514"  # never auto-routed

When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.

Model Routing

LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.

How It Works

The ModelRouter scores each incoming CompletionRequest based on heuristics
The score maps to a TaskComplexity tier: Simple, Medium, or Complex
Each tier has a pre-configured model

Scoring Heuristics

Signal	Weight	Logic
Total message length	1 point per ~4 chars	Rough token proxy
Tool availability	+20 per tool defined	Tools imply multi-step work
Code markers	+30 per marker found	Backticks, `fn`, `def`, `class`, `import`, `function`, `async`, `await`, `struct`, `impl`, `return`
Conversation depth	+15 per message > 10	Deep context = harder reasoning
System prompt length	+1 per 10 chars > 500	Long system prompts imply complex tasks

Thresholds

Complexity	Score Range	Default Model
Simple	`score < 100`	`claude-haiku-4-5-20251001`
Medium	`100 <= score < 500`	`claude-sonnet-4-20250514`
Complex	`score >= 500`	`claude-sonnet-4-20250514`

Configuration

Model routing is configured per-agent in the agent manifest — not as a top-level section in ~/.librefang/config.toml. Putting [routing] at the top of config.toml will be ignored (the kernel logs Unknown config field (ignored) field="routing").

# In an agent manifest file
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

The router also integrates with the model catalog:

validate_models() checks that all configured model IDs exist in the catalog
resolve_aliases() expands aliases to canonical IDs (e.g., "sonnet" becomes "claude-sonnet-4-20250514")

Cost Tracking

LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.

Per-Response Cost Estimation

After each LLM call, cost is calculated as:

cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate

The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.

Cost Rates (per million tokens)

Model Pattern	Input $/M	Output $/M
`haiku`	$0.25	$1.25
`sonnet`	$3.00	$15.00
`opus`	$15.00	$75.00
`gpt-4o-mini`	$0.15	$0.60
`gpt-4o`	$2.50	$10.00
`gpt-4.1-nano`	$0.10	$0.40
`gpt-4.1-mini`	$0.40	$1.60
`gpt-4.1`	$2.00	$8.00
`o3-mini`	$1.10	$4.40
`gemini-2.5-pro`	$1.25	$10.00
`gemini-2.5-flash`	$0.15	$0.60
`gemini-2.0-flash`	$0.10	$0.40
`deepseek-reasoner` / `deepseek-r1`	$0.55	$2.19
`deepseek`	$0.27	$1.10
`cerebras`	$0.06	$0.06
`sambanova`	$0.06	$0.06
`replicate`	$0.40	$0.40
`llama` / `mixtral`	$0.05	$0.10
`qwen`	$0.20	$0.60
`mistral-large*`	$2.00	$6.00
`mistral` (other)	$0.10	$0.30
`command-r-plus`	$2.50	$10.00
`command-r`	$0.15	$0.60
`sonar-pro`	$3.00	$15.00
`sonar` (other)	$1.00	$5.00
`grok-2-mini` / `grok-mini`	$0.30	$0.50
`grok` (other)	$2.00	$10.00
`jamba`	$2.00	$8.00
Default (unknown)	$1.00	$3.00

Quota Enforcement

Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.

# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00   # cap at $5/hour

The usage footer (when enabled) appends cost information to each response:

> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514

Fallback Providers

The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.

Behavior

On success: returns immediately
On rate limit / overload errors (429, 529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff)
On all other errors: logs a warning and tries the next driver in the chain
If all drivers fail: returns the last error

Configuration

Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.

# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]

The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).

API Endpoints

List All Models

GET /api/models

Returns the complete model catalog with metadata, pricing, and feature flags.

Response:

[
  {
    "id": "claude-sonnet-4-20250514",
    "display_name": "Claude Sonnet 4",
    "provider": "anthropic",
    "tier": "Smart",
    "context_window": 200000,
    "max_output_tokens": 64000,
    "input_cost_per_m": 3.0,
    "output_cost_per_m": 15.0,
    "supports_tools": true,
    "supports_vision": true,
    "supports_streaming": true,
    "aliases": ["sonnet", "claude-sonnet"]
  }
]

Get Specific Model

GET /api/models/{id}

Returns a single model entry. Supports both canonical IDs and aliases.

GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514

List Aliases

GET /api/models/aliases

Returns a map of all alias-to-canonical-ID mappings.

Response:

{
  "sonnet": "claude-sonnet-4-20250514",
  "haiku": "claude-haiku-4-5-20251001",
  "flash": "gemini-2.5-flash",
  "grok": "grok-2"
}

List Providers

GET /api/providers

Returns all 49 providers with auth status and model counts.

Response:

[
  {
    "id": "anthropic",
    "display_name": "Anthropic",
    "api_key_env": "ANTHROPIC_API_KEY",
    "base_url": "https://api.anthropic.com",
    "key_required": true,
    "auth_status": "Configured",
    "model_count": 3
  },
  {
    "id": "ollama",
    "display_name": "Ollama",
    "api_key_env": "OLLAMA_API_KEY",
    "base_url": "http://localhost:11434/v1",
    "key_required": false,
    "auth_status": "NotRequired",
    "model_count": 5
  }
]

Auth status values: Configured, Missing, NotRequired.

Set Provider API Key

POST /api/providers/{name}/key
Content-Type: application/json

{ "api_key": "sk-..." }

Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).

Remove Provider API Key

DELETE /api/providers/{name}/key

Removes the configured API key for a provider.

Test Provider Connection

POST /api/providers/{name}/test

Sends a minimal test request to verify the provider is reachable and the API key is valid.

Channel Commands

Two chat commands are available in any channel for inspecting models and providers:

`/models`

Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).

/models

Example output:

Available models (12):

Frontier:
  claude-opus-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-pro (Google Gemini) — 1M ctx

Smart:
  claude-sonnet-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-flash (Google Gemini) — 1M ctx
  deepseek-chat (DeepSeek) — 64K ctx

Balanced:
  llama-3.3-70b-versatile (Groq) — 128K ctx

Fast:
  claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
  gemini-2.0-flash (Google Gemini) — 1M ctx

Local:
  llama3.2 (Ollama) — 128K ctx

`/providers`

Lists all 49 providers with their authentication status.

/providers

Example output:

LLM Providers (49):

  Anthropic          ANTHROPIC_API_KEY       Configured    3 models
  OpenAI             OPENAI_API_KEY          Missing       6 models
  Google Gemini      GEMINI_API_KEY          Configured    3 models
  DeepSeek           DEEPSEEK_API_KEY        Missing       2 models
  Groq               GROQ_API_KEY            Configured    4 models
  Ollama             (no key needed)         Ready         3 models
  vLLM               (no key needed)         Ready         1 model
  LM Studio          (no key needed)         Ready         1 model
  ...

Environment Variables Summary

Quick reference for all provider environment variables:

Provider	Env Var	Required
Anthropic	`ANTHROPIC_API_KEY`	Yes
OpenAI	`OPENAI_API_KEY`	Yes
Google Gemini	`GEMINI_API_KEY` or `GOOGLE_API_KEY`	Yes
DeepSeek	`DEEPSEEK_API_KEY`	Yes
Groq	`GROQ_API_KEY`	Yes
OpenRouter	`OPENROUTER_API_KEY`	Yes
Mistral AI	`MISTRAL_API_KEY`	Yes
Together AI	`TOGETHER_API_KEY`	Yes
Fireworks AI	`FIREWORKS_API_KEY`	Yes
Ollama	`OLLAMA_API_KEY`	No
vLLM	`VLLM_API_KEY`	No
LM Studio	`LMSTUDIO_API_KEY`	No
Perplexity AI	`PERPLEXITY_API_KEY`	Yes
Cohere	`COHERE_API_KEY`	Yes
AI21 Labs	`AI21_API_KEY`	Yes
Cerebras	`CEREBRAS_API_KEY`	Yes
SambaNova	`SAMBANOVA_API_KEY`	Yes
Hugging Face	`HF_API_KEY`	Yes
xAI	`XAI_API_KEY`	Yes
Replicate	`REPLICATE_API_TOKEN`	Yes
Claude Code	`ANTHROPIC_API_KEY`	Yes
NVIDIA NIM	`NVIDIA_API_KEY`	Yes
Voyage AI	`VOYAGE_API_KEY`	Yes
Anyscale	`ANYSCALE_API_KEY`	Yes
DeepInfra	`DEEPINFRA_API_KEY`	Yes
Azure OpenAI	`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`	Yes
Amazon Bedrock	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`	Yes
Google Vertex AI	`GOOGLE_APPLICATION_CREDENTIALS`, `VERTEX_PROJECT`, `VERTEX_LOCATION`	Yes

Security Notes

All API keys are stored as Zeroizing<String> -- the key material is automatically overwritten with zeros when the value is dropped from memory.
Auth detection (detect_auth()) only checks std::env::var() for presence -- it never reads or logs the actual secret value.
Provider API keys set via the REST API (POST /api/providers/{name}/key) follow the same zeroization policy.
The health endpoint (/api/health) never exposes provider auth status or API keys. Detailed info is behind /api/health/detail which requires authentication.
All DriverConfig and KernelConfig structs implement Debug with secret redaction -- API keys are printed as "***" in logs.

Provider Fallback Chain

Source: librefang-llm-drivers/src/drivers/fallback_chain.rs

A FallbackChain lets you configure an ordered list of LLM providers that LibreFang tries in sequence. When a provider fails, the runtime classifies the failure and decides whether to advance to the next provider or surface the error immediately.

Failover Reasons

LibreFang classifies each failure before deciding whether to fall forward:

`FailoverReason`	Description	Falls over?
`RateLimit`	HTTP 429 or `x-ratelimit-remaining: 0`	Yes — try next provider
`Timeout`	Request exceeded the configured deadline	Yes
`ServerError`	HTTP 5xx response	Yes
`ModelNotFound`	HTTP 404 for the requested model ID	Yes
`AuthError`	HTTP 401 / 403	No — misconfigured key, stop immediately
`Unknown`	Any other error	Yes

AuthError deliberately does not fall forward — a missing or revoked key requires operator attention and silently retrying with another provider can mask the problem.

Configuration

Define the chain under [[providers.fallback_chain]] in ~/.librefang/config.toml:

[[providers.fallback_chain]]
name = "anthropic"
model = "claude-sonnet-4-5"

[[providers.fallback_chain]]
name = "openai"
model = "gpt-4o"

[[providers.fallback_chain]]
name = "openai-compatible"
base_url = "http://localhost:11434/v1"
model = "llama3.1:8b"

LibreFang works through the list in order. If all providers fail for eligible reasons, the final error from the last provider is returned to the caller.

Interaction with Credential Pool

Each entry in the fallback chain can itself have multiple keys managed by a Credential Pool. Rate-limit exhaustion on one key causes the pool to rotate to the next key before the chain advances to the next provider.

Credential Pool

Source: librefang-llm-drivers/src/credential_pool.rs

A CredentialPool manages multiple API keys for a single provider and rotates between them automatically. This extends effective rate limits without manual intervention and provides continuity when individual keys are temporarily exhausted.

Rotation Strategies

Strategy	Behavior
`FillFirst`	Always use the first non-exhausted key. Keys are tried in order; exhausted keys are skipped.
`RoundRobin`	Rotate through keys in a fixed cycle, skipping exhausted ones.
`Random`	Pick a random non-exhausted key on each request.
`LeastUsed`	Pick the key with the fewest total requests, skipping exhausted ones.

The default strategy is RoundRobin.

Configuration

[providers.anthropic]
rotation_strategy = "round_robin"  # fill_first | round_robin | random | least_used

[[providers.anthropic.keys]]
key = "sk-ant-key-1"

[[providers.anthropic.keys]]
key = "sk-ant-key-2"

[[providers.anthropic.keys]]
key = "sk-ant-key-3"

Exhaustion and Cooldown

A key is marked exhausted when the response headers report zero remaining quota (x-ratelimit-remaining-requests: 0 or x-ratelimit-remaining-tokens: 0). Exhausted keys are skipped by all rotation strategies.

After 1 hour, an exhausted key automatically re-enters the active pool. This cooldown is not configurable in the current release; it corresponds to the standard window used by most OpenAI-compatible providers.

If all keys in the pool are simultaneously exhausted, the CredentialPool returns a RateLimit error, which triggers fallback to the next provider in the chain (if configured).

Usage Tracking

Each key in the pool tracks:

Total request count
Total token count (prompt + completion)
Exhaustion timestamp (if currently exhausted)

These counters are in-memory only and reset when the daemon restarts. Persistent usage tracking is available through the budget ledger (/api/providers/{name}/usage).

Rate Limit Tracker

Source: librefang-llm-drivers/src/rate_limit_tracker.rs

After every LLM API response, LibreFang parses the provider's rate-limit headers into a RateLimitSnapshot. This snapshot is used by the credential pool to detect exhaustion and is surfaced in the dashboard and CLI for operator visibility.

Parsed Headers

LibreFang recognizes 21 header variants across major providers:

Header	Meaning
`x-ratelimit-limit-requests`	Per-window request ceiling
`x-ratelimit-remaining-requests`	Requests remaining in current window
`x-ratelimit-reset-requests`	Time until request quota resets
`x-ratelimit-limit-tokens`	Per-window token ceiling
`x-ratelimit-remaining-tokens`	Tokens remaining in current window
`x-ratelimit-reset-tokens`	Time until token quota resets
`ratelimit-limit`	Generic combined limit (some providers)
`ratelimit-remaining`	Generic combined remaining
`ratelimit-reset`	Generic reset time
`retry-after`	Seconds until retry is safe (429 responses)
`x-rate-limit-limit-requests`	Alternate spelling variant
`x-rate-limit-remaining-requests`	Alternate spelling variant
… and 9 more provider-specific variants

ASCII Progress Bar

The CLI and dashboard display a compact progress bar for each active provider:

anthropic  [████████░░░░░░░░░░░░]  req: 847/1000  tok: 42.3K/90K
openai     [██████████████░░░░░░]  req: 712/1000  tok: 61.1K/90K

The bar width is 20 characters. Each filled block (█) represents 5% of the limit consumed. An empty bar means no quota data is available from the provider.

Accessing the Snapshot

# Current rate-limit snapshot for all providers
curl http://127.0.0.1:4545/api/providers/rate-limits

{
  "anthropic": {
    "requests": { "limit": 1000, "remaining": 153, "reset_in_seconds": 34 },
    "tokens":   { "limit": 90000, "remaining": 47700, "reset_in_seconds": 34 }
  }
}

Fields are null when the provider did not return the corresponding header.