Provider Management
This page covers provider catalog loading, model metadata, aliases, routing, spend controls, fallback behavior, REST endpoints, channel commands, and security notes.
Included Topics
- Dynamic Provider Loading
- Model Catalog
- Model Aliases
- Per-Agent Model Override
- Model Routing
- Cost Tracking
- Fallback Providers
- API Endpoints
- Channel Commands
- Environment Variables Summary
- Security Notes
Dynamic Provider Loading
Official provider definitions are pre-installed to ~/.librefang/providers/ from the registry. To add or override providers, place custom .toml files in the same directory. Each file defines one provider:
# ~/.librefang/providers/my-endpoint.toml
id = "my-endpoint"
display_name = "My Private Endpoint"
driver = "openai_compatible"
base_url = "https://llm.internal.company.com/v1"
api_key_env = "MY_ENDPOINT_KEY"
key_required = true
[[models]]
id = "my-model-7b"
display_name = "My Model 7B"
tier = "Balanced"
context_window = 32768
max_output_tokens = 4096
input_cost_per_m = 0.0
output_cost_per_m = 0.0
supports_tools = true
supports_vision = false
Files in ~/.librefang/providers/ are loaded at startup and merged into the catalog alongside the builtin providers.
Custom Image-Generation Providers
A media driver name not in the builtin list (openai, gemini, elevenlabs, minimax, google_tts) used to fail with "Unknown media provider". Custom providers with a configured URL now fall through to a generic OpenAI-compatible image driver:
# config.toml
[provider_urls]
volcengine = "https://open.volcengineapi.com/v1"
tongyi = "https://dashscope.aliyuncs.com/compatible-mode/v1"
export VOLCENGINE_API_KEY="sk-..."
export TONGYI_API_KEY="sk-..."
The driver expects standard OpenAI Images API request/response shape (/images/generations endpoint, b64_json or url returned). Set [media] image_provider = "volcengine" (or whichever id you used in [provider_urls]) to route image-generation calls there.
Driver-side File Upload
The Moonshot (Kimi) driver uploads attachments to /v1/files before forwarding the request to /v1/chat/completions, so messages with image / PDF / text attachments work end-to-end without you wiring per-attachment marshalling. Files are written to std::env::temp_dir(), content-type validated, size-limited, and filename-sanitised before upload. This is transparent — no extra config; just attach files via POST /api/agents/{id}/upload or the dashboard chat as usual.
Provider Regions
Some providers offer region-specific endpoints. Regions are defined in registry TOML files with an optional api_key_env override:
# In a provider's registry TOML:
[provider.regions.intl]
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
[provider.regions.china]
base_url = "https://api.minimaxi.com/v1"
api_key_env = "MINIMAX_CN_API_KEY" # Optional: override the default API key env var
Select a region in config.toml:
[provider_regions]
qwen = "intl"
minimax = "china"
Priority: Region selections are applied before explicit [provider_urls] entries. If both are set for the same provider, provider_urls wins.
Model Catalog
The complete catalog of all 230+ builtin models, sorted by provider. Pricing is per million tokens.
| # | Model ID | Display Name | Provider | Tier | Context Window | Max Output | Input $/M | Output $/M | Tools | Vision |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | claude-opus-4-20250514 | Claude Opus 4 | anthropic | Frontier | 200,000 | 32,000 | $15.00 | $75.00 | Yes | Yes |
| 2 | claude-sonnet-4-20250514 | Claude Sonnet 4 | anthropic | Smart | 200,000 | 64,000 | $3.00 | $15.00 | Yes | Yes |
| 3 | claude-haiku-4-5-20251001 | Claude Haiku 4.5 | anthropic | Fast | 200,000 | 8,192 | $0.25 | $1.25 | Yes | Yes |
| 4 | gpt-4.1 | GPT-4.1 | openai | Frontier | 1,047,576 | 32,768 | $2.00 | $8.00 | Yes | Yes |
| 5 | gpt-4o | GPT-4o | openai | Smart | 128,000 | 16,384 | $2.50 | $10.00 | Yes | Yes |
| 6 | o3-mini | o3-mini | openai | Smart | 200,000 | 100,000 | $1.10 | $4.40 | Yes | No |
| 7 | gpt-4.1-mini | GPT-4.1 Mini | openai | Balanced | 1,047,576 | 32,768 | $0.40 | $1.60 | Yes | Yes |
| 8 | gpt-4o-mini | GPT-4o Mini | openai | Fast | 128,000 | 16,384 | $0.15 | $0.60 | Yes | Yes |
| 9 | gpt-4.1-nano | GPT-4.1 Nano | openai | Fast | 1,047,576 | 32,768 | $0.10 | $0.40 | Yes | No |
| 10 | gemini-2.5-pro | Gemini 2.5 Pro | gemini | Frontier | 1,048,576 | 65,536 | $1.25 | $10.00 | Yes | Yes |
| 11 | gemini-2.5-flash | Gemini 2.5 Flash | gemini | Smart | 1,048,576 | 65,536 | $0.15 | $0.60 | Yes | Yes |
| 12 | gemini-2.0-flash | Gemini 2.0 Flash | gemini | Fast | 1,048,576 | 8,192 | $0.10 | $0.40 | Yes | Yes |
| 13 | deepseek-chat | DeepSeek V3 | deepseek | Smart | 64,000 | 8,192 | $0.27 | $1.10 | Yes | No |
| 14 | deepseek-reasoner | DeepSeek R1 | deepseek | Smart | 64,000 | 8,192 | $0.55 | $2.19 | No | No |
| 15 | llama-3.3-70b-versatile | Llama 3.3 70B | groq | Balanced | 128,000 | 32,768 | $0.059 | $0.079 | Yes | No |
| 16 | mixtral-8x7b-32768 | Mixtral 8x7B | groq | Balanced | 32,768 | 4,096 | $0.024 | $0.024 | Yes | No |
| 17 | llama-3.1-8b-instant | Llama 3.1 8B | groq | Fast | 128,000 | 8,192 | $0.05 | $0.08 | Yes | No |
| 18 | gemma2-9b-it | Gemma 2 9B | groq | Fast | 8,192 | 4,096 | $0.02 | $0.02 | No | No |
| 19 | openrouter/google/gemini-2.5-flash | Gemini 2.5 Flash (OpenRouter) | openrouter | Smart | 1,048,576 | 65,536 | $0.15 | $0.60 | Yes | Yes |
| 20 | openrouter/anthropic/claude-sonnet-4 | Claude Sonnet 4 (OpenRouter) | openrouter | Smart | 200,000 | 64,000 | $3.00 | $15.00 | Yes | Yes |
| 21 | openrouter/openai/gpt-4o | GPT-4o (OpenRouter) | openrouter | Smart | 128,000 | 16,384 | $2.50 | $10.00 | Yes | Yes |
| 22 | openrouter/deepseek/deepseek-chat | DeepSeek V3 (OpenRouter) | openrouter | Smart | 128,000 | 32,768 | $0.14 | $0.28 | Yes | No |
| 23 | openrouter/meta-llama/llama-3.3-70b-instruct | Llama 3.3 70B (OpenRouter) | openrouter | Balanced | 128,000 | 32,768 | $0.39 | $0.39 | Yes | No |
| 24 | openrouter/qwen/qwen-2.5-72b-instruct | Qwen 2.5 72B (OpenRouter) | openrouter | Balanced | 128,000 | 32,768 | $0.36 | $0.36 | Yes | No |
| 25 | openrouter/google/gemini-2.5-pro | Gemini 2.5 Pro (OpenRouter) | openrouter | Frontier | 1,048,576 | 65,536 | $1.25 | $10.00 | Yes | Yes |
| 26 | openrouter/mistralai/mistral-large-latest | Mistral Large (OpenRouter) | openrouter | Smart | 128,000 | 8,192 | $2.00 | $6.00 | Yes | No |
| 27 | openrouter/google/gemma-2-9b-it | Gemma 2 9B (OpenRouter) | openrouter | Fast | 8,192 | 4,096 | $0.00 | $0.00 | No | No |
| 28 | openrouter/deepseek/deepseek-r1 | DeepSeek R1 (OpenRouter) | openrouter | Frontier | 128,000 | 32,768 | $0.55 | $2.19 | No | No |
| 29 | mistral-large-latest | Mistral Large | mistral | Smart | 128,000 | 8,192 | $2.00 | $6.00 | Yes | No |
| 30 | codestral-latest | Codestral | mistral | Smart | 32,000 | 8,192 | $0.30 | $0.90 | Yes | No |
| 31 | mistral-small-latest | Mistral Small | mistral | Fast | 128,000 | 8,192 | $0.10 | $0.30 | Yes | No |
| 32 | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | Llama 3.1 405B (Together) | together | Frontier | 130,000 | 4,096 | $3.50 | $3.50 | Yes | No |
| 33 | Qwen/Qwen2.5-72B-Instruct-Turbo | Qwen 2.5 72B (Together) | together | Smart | 32,768 | 4,096 | $0.20 | $0.60 | Yes | No |
| 34 | mistralai/Mixtral-8x22B-Instruct-v0.1 | Mixtral 8x22B (Together) | together | Balanced | 65,536 | 4,096 | $0.60 | $0.60 | Yes | No |
| 35 | accounts/fireworks/models/llama-v3p1-405b-instruct | Llama 3.1 405B (Fireworks) | fireworks | Frontier | 131,072 | 16,384 | $3.00 | $3.00 | Yes | No |
| 36 | accounts/fireworks/models/mixtral-8x22b-instruct | Mixtral 8x22B (Fireworks) | fireworks | Balanced | 65,536 | 4,096 | $0.90 | $0.90 | Yes | No |
| 37 | llama3.2 | Llama 3.2 (Ollama) | ollama | Local | 128,000 | 4,096 | $0.00 | $0.00 | Yes | No |
| 38 | mistral:latest | Mistral (Ollama) | ollama | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
| 39 | phi3 | Phi-3 (Ollama) | ollama | Local | 128,000 | 4,096 | $0.00 | $0.00 | No | No |
| 40 | vllm-local | vLLM Local Model | vllm | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
| 41 | lmstudio-local | LM Studio Local Model | lmstudio | Local | 32,768 | 4,096 | $0.00 | $0.00 | Yes | No |
| 42 | sonar-pro | Sonar Pro | perplexity | Smart | 200,000 | 8,192 | $3.00 | $15.00 | No | No |
| 43 | sonar | Sonar | perplexity | Balanced | 128,000 | 8,192 | $1.00 | $5.00 | No | No |
| 44 | command-r-plus | Command R+ | cohere | Smart | 128,000 | 4,096 | $2.50 | $10.00 | Yes | No |
| 45 | command-r | Command R | cohere | Balanced | 128,000 | 4,096 | $0.15 | $0.60 | Yes | No |
| 46 | jamba-1.5-large | Jamba 1.5 Large | ai21 | Smart | 256,000 | 4,096 | $2.00 | $8.00 | Yes | No |
| 47 | cerebras/llama3.3-70b | Llama 3.3 70B (Cerebras) | cerebras | Balanced | 128,000 | 8,192 | $0.06 | $0.06 | Yes | No |
| 48 | cerebras/llama3.1-8b | Llama 3.1 8B (Cerebras) | cerebras | Fast | 128,000 | 8,192 | $0.01 | $0.01 | Yes | No |
| 49 | sambanova/llama-3.3-70b | Llama 3.3 70B (SambaNova) | sambanova | Balanced | 128,000 | 8,192 | $0.06 | $0.06 | Yes | No |
| 50 | grok-2 | Grok 2 | xai | Smart | 131,072 | 32,768 | $2.00 | $10.00 | Yes | Yes |
| 51 | grok-2-mini | Grok 2 Mini | xai | Fast | 131,072 | 32,768 | $0.30 | $0.50 | Yes | No |
| 52 | hf/meta-llama/Llama-3.3-70B-Instruct | Llama 3.3 70B (HF) | huggingface | Balanced | 128,000 | 4,096 | $0.30 | $0.30 | No | No |
| 53 | replicate/meta-llama-3.3-70b-instruct | Llama 3.3 70B (Replicate) | replicate | Balanced | 128,000 | 4,096 | $0.40 | $0.40 | No | No |
Model Tiers:
| Tier | Description | Typical Use |
|---|---|---|
| Frontier | Most capable, highest cost | Orchestration, architecture, security audits |
| Smart | Strong reasoning, moderate cost | Coding, code review, research, analysis |
| Balanced | Good cost/quality tradeoff | Planning, writing, DevOps, day-to-day tasks |
| Fast | Cheapest cloud inference | Ops, translation, simple Q&A, health checks |
| Local | Self-hosted, zero cost | Privacy-first, offline, development |
Notes:
- Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with
Localtier and zero cost. - The entries above are a representative subset of the 230+ builtin models. The full catalog includes additional models per provider and runtime auto-discovered models that vary per installation.
Model Aliases
All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.
| Alias | Resolves To |
|---|---|
sonnet | claude-sonnet-4-20250514 |
claude-sonnet | claude-sonnet-4-20250514 |
haiku | claude-haiku-4-5-20251001 |
claude-haiku | claude-haiku-4-5-20251001 |
opus | claude-opus-4-20250514 |
claude-opus | claude-opus-4-20250514 |
gpt4 | gpt-4o |
gpt4o | gpt-4o |
gpt4-mini | gpt-4o-mini |
flash | gemini-2.5-flash |
gemini-flash | gemini-2.5-flash |
gemini-pro | gemini-2.5-pro |
deepseek | deepseek-chat |
llama | llama-3.3-70b-versatile |
llama-70b | llama-3.3-70b-versatile |
mixtral | mixtral-8x7b-32768 |
mistral | mistral-large-latest |
codestral | codestral-latest |
grok | grok-2 |
grok-mini | grok-2-mini |
sonar | sonar-pro |
jamba | jamba-1.5-large |
command-r | command-r-plus |
You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.
Per-Agent Model Override
Each agent in your config.toml can specify its own model, overriding the global default:
# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"
# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus" # alias for claude-opus-4-20250514
[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile" # cheap Groq model for simple ops
[[agents]]
name = "coder"
model = "gemini-2.5-flash" # fast + cheap + 1M context
[[agents]]
name = "researcher"
model = "sonar-pro" # Perplexity with built-in web search
# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514" # never auto-routed
When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.
Model Routing
LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.
How It Works
- The ModelRouter scores each incoming
CompletionRequestbased on heuristics - The score maps to a TaskComplexity tier:
Simple,Medium, orComplex - Each tier has a pre-configured model
Scoring Heuristics
| Signal | Weight | Logic |
|---|---|---|
| Total message length | 1 point per ~4 chars | Rough token proxy |
| Tool availability | +20 per tool defined | Tools imply multi-step work |
| Code markers | +30 per marker found | Backticks, fn, def, class, import, function, async, await, struct, impl, return |
| Conversation depth | +15 per message > 10 | Deep context = harder reasoning |
| System prompt length | +1 per 10 chars > 500 | Long system prompts imply complex tasks |
Thresholds
| Complexity | Score Range | Default Model |
|---|---|---|
| Simple | score < 100 | claude-haiku-4-5-20251001 |
| Medium | 100 <= score < 500 | claude-sonnet-4-20250514 |
| Complex | score >= 500 | claude-sonnet-4-20250514 |
Configuration
Model routing is configured per-agent in the agent manifest — not as a top-level section in ~/.librefang/config.toml. Putting [routing] at the top of config.toml will be ignored (the kernel logs Unknown config field (ignored) field="routing").
# In an agent manifest file
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500
The router also integrates with the model catalog:
validate_models()checks that all configured model IDs exist in the catalogresolve_aliases()expands aliases to canonical IDs (e.g.,"sonnet"becomes"claude-sonnet-4-20250514")
Cost Tracking
LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.
Per-Response Cost Estimation
After each LLM call, cost is calculated as:
cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate
The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.
Cost Rates (per million tokens)
| Model Pattern | Input $/M | Output $/M |
|---|---|---|
*haiku* | $0.25 | $1.25 |
*sonnet* | $3.00 | $15.00 |
*opus* | $15.00 | $75.00 |
gpt-4o-mini | $0.15 | $0.60 |
gpt-4o | $2.50 | $10.00 |
gpt-4.1-nano | $0.10 | $0.40 |
gpt-4.1-mini | $0.40 | $1.60 |
gpt-4.1 | $2.00 | $8.00 |
o3-mini | $1.10 | $4.40 |
gemini-2.5-pro | $1.25 | $10.00 |
gemini-2.5-flash | $0.15 | $0.60 |
gemini-2.0-flash | $0.10 | $0.40 |
deepseek-reasoner / deepseek-r1 | $0.55 | $2.19 |
*deepseek* | $0.27 | $1.10 |
*cerebras* | $0.06 | $0.06 |
*sambanova* | $0.06 | $0.06 |
*replicate* | $0.40 | $0.40 |
*llama* / *mixtral* | $0.05 | $0.10 |
*qwen* | $0.20 | $0.60 |
mistral-large* | $2.00 | $6.00 |
*mistral* (other) | $0.10 | $0.30 |
command-r-plus | $2.50 | $10.00 |
command-r | $0.15 | $0.60 |
sonar-pro | $3.00 | $15.00 |
*sonar* (other) | $1.00 | $5.00 |
grok-2-mini / grok-mini | $0.30 | $0.50 |
*grok* (other) | $2.00 | $10.00 |
*jamba* | $2.00 | $8.00 |
| Default (unknown) | $1.00 | $3.00 |
Quota Enforcement
Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.
# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00 # cap at $5/hour
The usage footer (when enabled) appends cost information to each response:
> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514
Fallback Providers
The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.
Behavior
- On success: returns immediately
- On rate limit / overload errors (
429,529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff) - On all other errors: logs a warning and tries the next driver in the chain
- If all drivers fail: returns the last error
Configuration
Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.
# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]
The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).
API Endpoints
List All Models
GET /api/models
Returns the complete model catalog with metadata, pricing, and feature flags.
Response:
[
{
"id": "claude-sonnet-4-20250514",
"display_name": "Claude Sonnet 4",
"provider": "anthropic",
"tier": "Smart",
"context_window": 200000,
"max_output_tokens": 64000,
"input_cost_per_m": 3.0,
"output_cost_per_m": 15.0,
"supports_tools": true,
"supports_vision": true,
"supports_streaming": true,
"aliases": ["sonnet", "claude-sonnet"]
}
]
Get Specific Model
GET /api/models/{id}
Returns a single model entry. Supports both canonical IDs and aliases.
GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514
List Aliases
GET /api/models/aliases
Returns a map of all alias-to-canonical-ID mappings.
Response:
{
"sonnet": "claude-sonnet-4-20250514",
"haiku": "claude-haiku-4-5-20251001",
"flash": "gemini-2.5-flash",
"grok": "grok-2"
}
List Providers
GET /api/providers
Returns all 49 providers with auth status and model counts.
Response:
[
{
"id": "anthropic",
"display_name": "Anthropic",
"api_key_env": "ANTHROPIC_API_KEY",
"base_url": "https://api.anthropic.com",
"key_required": true,
"auth_status": "Configured",
"model_count": 3
},
{
"id": "ollama",
"display_name": "Ollama",
"api_key_env": "OLLAMA_API_KEY",
"base_url": "http://localhost:11434/v1",
"key_required": false,
"auth_status": "NotRequired",
"model_count": 5
}
]
Auth status values: Configured, Missing, NotRequired.
Set Provider API Key
POST /api/providers/{name}/key
Content-Type: application/json
{ "api_key": "sk-..." }
Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).
Remove Provider API Key
DELETE /api/providers/{name}/key
Removes the configured API key for a provider.
Test Provider Connection
POST /api/providers/{name}/test
Sends a minimal test request to verify the provider is reachable and the API key is valid.
Channel Commands
Two chat commands are available in any channel for inspecting models and providers:
/models
Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).
/models
Example output:
Available models (12):
Frontier:
claude-opus-4-20250514 (Anthropic) — 200K ctx
gemini-2.5-pro (Google Gemini) — 1M ctx
Smart:
claude-sonnet-4-20250514 (Anthropic) — 200K ctx
gemini-2.5-flash (Google Gemini) — 1M ctx
deepseek-chat (DeepSeek) — 64K ctx
Balanced:
llama-3.3-70b-versatile (Groq) — 128K ctx
Fast:
claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
gemini-2.0-flash (Google Gemini) — 1M ctx
Local:
llama3.2 (Ollama) — 128K ctx
/providers
Lists all 49 providers with their authentication status.
/providers
Example output:
LLM Providers (49):
Anthropic ANTHROPIC_API_KEY Configured 3 models
OpenAI OPENAI_API_KEY Missing 6 models
Google Gemini GEMINI_API_KEY Configured 3 models
DeepSeek DEEPSEEK_API_KEY Missing 2 models
Groq GROQ_API_KEY Configured 4 models
Ollama (no key needed) Ready 3 models
vLLM (no key needed) Ready 1 model
LM Studio (no key needed) Ready 1 model
...
Environment Variables Summary
Quick reference for all provider environment variables:
| Provider | Env Var | Required |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | Yes |
| OpenAI | OPENAI_API_KEY | Yes |
| Google Gemini | GEMINI_API_KEY or GOOGLE_API_KEY | Yes |
| DeepSeek | DEEPSEEK_API_KEY | Yes |
| Groq | GROQ_API_KEY | Yes |
| OpenRouter | OPENROUTER_API_KEY | Yes |
| Mistral AI | MISTRAL_API_KEY | Yes |
| Together AI | TOGETHER_API_KEY | Yes |
| Fireworks AI | FIREWORKS_API_KEY | Yes |
| Ollama | OLLAMA_API_KEY | No |
| vLLM | VLLM_API_KEY | No |
| LM Studio | LMSTUDIO_API_KEY | No |
| Perplexity AI | PERPLEXITY_API_KEY | Yes |
| Cohere | COHERE_API_KEY | Yes |
| AI21 Labs | AI21_API_KEY | Yes |
| Cerebras | CEREBRAS_API_KEY | Yes |
| SambaNova | SAMBANOVA_API_KEY | Yes |
| Hugging Face | HF_API_KEY | Yes |
| xAI | XAI_API_KEY | Yes |
| Replicate | REPLICATE_API_TOKEN | Yes |
| Claude Code | ANTHROPIC_API_KEY | Yes |
| NVIDIA NIM | NVIDIA_API_KEY | Yes |
| Voyage AI | VOYAGE_API_KEY | Yes |
| Anyscale | ANYSCALE_API_KEY | Yes |
| DeepInfra | DEEPINFRA_API_KEY | Yes |
| Azure OpenAI | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT | Yes |
| Amazon Bedrock | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION | Yes |
| Google Vertex AI | GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATION | Yes |
Security Notes
- All API keys are stored as
Zeroizing<String>-- the key material is automatically overwritten with zeros when the value is dropped from memory. - Auth detection (
detect_auth()) only checksstd::env::var()for presence -- it never reads or logs the actual secret value. - Provider API keys set via the REST API (
POST /api/providers/{name}/key) follow the same zeroization policy. - The health endpoint (
/api/health) never exposes provider auth status or API keys. Detailed info is behind/api/health/detailwhich requires authentication. - All
DriverConfigandKernelConfigstructs implementDebugwith secret redaction -- API keys are printed as"***"in logs.
Provider Fallback Chain
Source: librefang-llm-drivers/src/drivers/fallback_chain.rs
A FallbackChain lets you configure an ordered list of LLM providers that LibreFang
tries in sequence. When a provider fails, the runtime classifies the failure and decides
whether to advance to the next provider or surface the error immediately.
Failover Reasons
LibreFang classifies each failure before deciding whether to fall forward:
FailoverReason | Description | Falls over? |
|---|---|---|
RateLimit | HTTP 429 or x-ratelimit-remaining: 0 | Yes — try next provider |
Timeout | Request exceeded the configured deadline | Yes |
ServerError | HTTP 5xx response | Yes |
ModelNotFound | HTTP 404 for the requested model ID | Yes |
AuthError | HTTP 401 / 403 | No — misconfigured key, stop immediately |
Unknown | Any other error | Yes |
AuthError deliberately does not fall forward — a missing or revoked key requires
operator attention and silently retrying with another provider can mask the problem.
Configuration
Define the chain under [[providers.fallback_chain]] in ~/.librefang/config.toml:
[[providers.fallback_chain]]
name = "anthropic"
model = "claude-sonnet-4-5"
[[providers.fallback_chain]]
name = "openai"
model = "gpt-4o"
[[providers.fallback_chain]]
name = "openai-compatible"
base_url = "http://localhost:11434/v1"
model = "llama3.1:8b"
LibreFang works through the list in order. If all providers fail for eligible reasons, the final error from the last provider is returned to the caller.
Interaction with Credential Pool
Each entry in the fallback chain can itself have multiple keys managed by a Credential Pool. Rate-limit exhaustion on one key causes the pool to rotate to the next key before the chain advances to the next provider.
Credential Pool
Source: librefang-llm-drivers/src/credential_pool.rs
A CredentialPool manages multiple API keys for a single provider and rotates between
them automatically. This extends effective rate limits without manual intervention and
provides continuity when individual keys are temporarily exhausted.
Rotation Strategies
| Strategy | Behavior |
|---|---|
FillFirst | Always use the first non-exhausted key. Keys are tried in order; exhausted keys are skipped. |
RoundRobin | Rotate through keys in a fixed cycle, skipping exhausted ones. |
Random | Pick a random non-exhausted key on each request. |
LeastUsed | Pick the key with the fewest total requests, skipping exhausted ones. |
The default strategy is RoundRobin.
Configuration
[providers.anthropic]
rotation_strategy = "round_robin" # fill_first | round_robin | random | least_used
[[providers.anthropic.keys]]
key = "sk-ant-key-1"
[[providers.anthropic.keys]]
key = "sk-ant-key-2"
[[providers.anthropic.keys]]
key = "sk-ant-key-3"
Exhaustion and Cooldown
A key is marked exhausted when the response headers report zero remaining quota
(x-ratelimit-remaining-requests: 0 or x-ratelimit-remaining-tokens: 0). Exhausted
keys are skipped by all rotation strategies.
After 1 hour, an exhausted key automatically re-enters the active pool. This cooldown is not configurable in the current release; it corresponds to the standard window used by most OpenAI-compatible providers.
If all keys in the pool are simultaneously exhausted, the CredentialPool returns a
RateLimit error, which triggers fallback to the next provider in the chain (if
configured).
Usage Tracking
Each key in the pool tracks:
- Total request count
- Total token count (prompt + completion)
- Exhaustion timestamp (if currently exhausted)
These counters are in-memory only and reset when the daemon restarts. Persistent usage
tracking is available through the budget ledger (/api/providers/{name}/usage).
Rate Limit Tracker
Source: librefang-llm-drivers/src/rate_limit_tracker.rs
After every LLM API response, LibreFang parses the provider's rate-limit headers into a
RateLimitSnapshot. This snapshot is used by the credential pool to detect exhaustion
and is surfaced in the dashboard and CLI for operator visibility.
Parsed Headers
LibreFang recognizes 21 header variants across major providers:
| Header | Meaning |
|---|---|
x-ratelimit-limit-requests | Per-window request ceiling |
x-ratelimit-remaining-requests | Requests remaining in current window |
x-ratelimit-reset-requests | Time until request quota resets |
x-ratelimit-limit-tokens | Per-window token ceiling |
x-ratelimit-remaining-tokens | Tokens remaining in current window |
x-ratelimit-reset-tokens | Time until token quota resets |
ratelimit-limit | Generic combined limit (some providers) |
ratelimit-remaining | Generic combined remaining |
ratelimit-reset | Generic reset time |
retry-after | Seconds until retry is safe (429 responses) |
x-rate-limit-limit-requests | Alternate spelling variant |
x-rate-limit-remaining-requests | Alternate spelling variant |
| … and 9 more provider-specific variants |
ASCII Progress Bar
The CLI and dashboard display a compact progress bar for each active provider:
anthropic [████████░░░░░░░░░░░░] req: 847/1000 tok: 42.3K/90K
openai [██████████████░░░░░░] req: 712/1000 tok: 61.1K/90K
The bar width is 20 characters. Each filled block (█) represents 5% of the limit
consumed. An empty bar means no quota data is available from the provider.
Accessing the Snapshot
# Current rate-limit snapshot for all providers
curl http://127.0.0.1:4545/api/providers/rate-limits
{
"anthropic": {
"requests": { "limit": 1000, "remaining": 153, "reset_in_seconds": 34 },
"tokens": { "limit": 90000, "remaining": 47700, "reset_in_seconds": 34 }
}
}
Fields are null when the provider did not return the corresponding header.