Provider Management

This page covers provider catalog loading, model metadata, aliases, routing, spend controls, fallback behavior, REST endpoints, channel commands, and security notes.

Included Topics

  • Dynamic Provider Loading
  • Model Catalog
  • Model Aliases
  • Per-Agent Model Override
  • Model Routing
  • Cost Tracking
  • Fallback Providers
  • API Endpoints
  • Channel Commands
  • Environment Variables Summary
  • Security Notes

Dynamic Provider Loading

Official provider definitions are pre-installed to ~/.librefang/providers/ from the registry. To add or override providers, place custom .toml files in the same directory. Each file defines one provider:

# ~/.librefang/providers/my-endpoint.toml
id = "my-endpoint"
display_name = "My Private Endpoint"
driver = "openai_compatible"
base_url = "https://llm.internal.company.com/v1"
api_key_env = "MY_ENDPOINT_KEY"
key_required = true

[[models]]
id = "my-model-7b"
display_name = "My Model 7B"
tier = "Balanced"
context_window = 32768
max_output_tokens = 4096
input_cost_per_m = 0.0
output_cost_per_m = 0.0
supports_tools = true
supports_vision = false

Files in ~/.librefang/providers/ are loaded at startup and merged into the catalog alongside the builtin providers.

Custom Image-Generation Providers

A media driver name not in the builtin list (openai, gemini, elevenlabs, minimax, google_tts) used to fail with "Unknown media provider". Custom providers with a configured URL now fall through to a generic OpenAI-compatible image driver:

# config.toml
[provider_urls]
volcengine = "https://open.volcengineapi.com/v1"
tongyi     = "https://dashscope.aliyuncs.com/compatible-mode/v1"
export VOLCENGINE_API_KEY="sk-..."
export TONGYI_API_KEY="sk-..."

The driver expects standard OpenAI Images API request/response shape (/images/generations endpoint, b64_json or url returned). Set [media] image_provider = "volcengine" (or whichever id you used in [provider_urls]) to route image-generation calls there.

Driver-side File Upload

The Moonshot (Kimi) driver uploads attachments to /v1/files before forwarding the request to /v1/chat/completions, so messages with image / PDF / text attachments work end-to-end without you wiring per-attachment marshalling. Files are written to std::env::temp_dir(), content-type validated, size-limited, and filename-sanitised before upload. This is transparent — no extra config; just attach files via POST /api/agents/{id}/upload or the dashboard chat as usual.

Provider Regions

Some providers offer region-specific endpoints. Regions are defined in registry TOML files with an optional api_key_env override:

# In a provider's registry TOML:
[provider.regions.intl]
base_url = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

[provider.regions.china]
base_url = "https://api.minimaxi.com/v1"
api_key_env = "MINIMAX_CN_API_KEY"    # Optional: override the default API key env var

Select a region in config.toml:

[provider_regions]
qwen = "intl"
minimax = "china"

Priority: Region selections are applied before explicit [provider_urls] entries. If both are set for the same provider, provider_urls wins.


Model Catalog

The complete catalog of all 230+ builtin models, sorted by provider. Pricing is per million tokens.

#Model IDDisplay NameProviderTierContext WindowMax OutputInput $/MOutput $/MToolsVision
1claude-opus-4-20250514Claude Opus 4anthropicFrontier200,00032,000$15.00$75.00YesYes
2claude-sonnet-4-20250514Claude Sonnet 4anthropicSmart200,00064,000$3.00$15.00YesYes
3claude-haiku-4-5-20251001Claude Haiku 4.5anthropicFast200,0008,192$0.25$1.25YesYes
4gpt-4.1GPT-4.1openaiFrontier1,047,57632,768$2.00$8.00YesYes
5gpt-4oGPT-4oopenaiSmart128,00016,384$2.50$10.00YesYes
6o3-minio3-miniopenaiSmart200,000100,000$1.10$4.40YesNo
7gpt-4.1-miniGPT-4.1 MiniopenaiBalanced1,047,57632,768$0.40$1.60YesYes
8gpt-4o-miniGPT-4o MiniopenaiFast128,00016,384$0.15$0.60YesYes
9gpt-4.1-nanoGPT-4.1 NanoopenaiFast1,047,57632,768$0.10$0.40YesNo
10gemini-2.5-proGemini 2.5 ProgeminiFrontier1,048,57665,536$1.25$10.00YesYes
11gemini-2.5-flashGemini 2.5 FlashgeminiSmart1,048,57665,536$0.15$0.60YesYes
12gemini-2.0-flashGemini 2.0 FlashgeminiFast1,048,5768,192$0.10$0.40YesYes
13deepseek-chatDeepSeek V3deepseekSmart64,0008,192$0.27$1.10YesNo
14deepseek-reasonerDeepSeek R1deepseekSmart64,0008,192$0.55$2.19NoNo
15llama-3.3-70b-versatileLlama 3.3 70BgroqBalanced128,00032,768$0.059$0.079YesNo
16mixtral-8x7b-32768Mixtral 8x7BgroqBalanced32,7684,096$0.024$0.024YesNo
17llama-3.1-8b-instantLlama 3.1 8BgroqFast128,0008,192$0.05$0.08YesNo
18gemma2-9b-itGemma 2 9BgroqFast8,1924,096$0.02$0.02NoNo
19openrouter/google/gemini-2.5-flashGemini 2.5 Flash (OpenRouter)openrouterSmart1,048,57665,536$0.15$0.60YesYes
20openrouter/anthropic/claude-sonnet-4Claude Sonnet 4 (OpenRouter)openrouterSmart200,00064,000$3.00$15.00YesYes
21openrouter/openai/gpt-4oGPT-4o (OpenRouter)openrouterSmart128,00016,384$2.50$10.00YesYes
22openrouter/deepseek/deepseek-chatDeepSeek V3 (OpenRouter)openrouterSmart128,00032,768$0.14$0.28YesNo
23openrouter/meta-llama/llama-3.3-70b-instructLlama 3.3 70B (OpenRouter)openrouterBalanced128,00032,768$0.39$0.39YesNo
24openrouter/qwen/qwen-2.5-72b-instructQwen 2.5 72B (OpenRouter)openrouterBalanced128,00032,768$0.36$0.36YesNo
25openrouter/google/gemini-2.5-proGemini 2.5 Pro (OpenRouter)openrouterFrontier1,048,57665,536$1.25$10.00YesYes
26openrouter/mistralai/mistral-large-latestMistral Large (OpenRouter)openrouterSmart128,0008,192$2.00$6.00YesNo
27openrouter/google/gemma-2-9b-itGemma 2 9B (OpenRouter)openrouterFast8,1924,096$0.00$0.00NoNo
28openrouter/deepseek/deepseek-r1DeepSeek R1 (OpenRouter)openrouterFrontier128,00032,768$0.55$2.19NoNo
29mistral-large-latestMistral LargemistralSmart128,0008,192$2.00$6.00YesNo
30codestral-latestCodestralmistralSmart32,0008,192$0.30$0.90YesNo
31mistral-small-latestMistral SmallmistralFast128,0008,192$0.10$0.30YesNo
32meta-llama/Meta-Llama-3.1-405B-Instruct-TurboLlama 3.1 405B (Together)togetherFrontier130,0004,096$3.50$3.50YesNo
33Qwen/Qwen2.5-72B-Instruct-TurboQwen 2.5 72B (Together)togetherSmart32,7684,096$0.20$0.60YesNo
34mistralai/Mixtral-8x22B-Instruct-v0.1Mixtral 8x22B (Together)togetherBalanced65,5364,096$0.60$0.60YesNo
35accounts/fireworks/models/llama-v3p1-405b-instructLlama 3.1 405B (Fireworks)fireworksFrontier131,07216,384$3.00$3.00YesNo
36accounts/fireworks/models/mixtral-8x22b-instructMixtral 8x22B (Fireworks)fireworksBalanced65,5364,096$0.90$0.90YesNo
37llama3.2Llama 3.2 (Ollama)ollamaLocal128,0004,096$0.00$0.00YesNo
38mistral:latestMistral (Ollama)ollamaLocal32,7684,096$0.00$0.00YesNo
39phi3Phi-3 (Ollama)ollamaLocal128,0004,096$0.00$0.00NoNo
40vllm-localvLLM Local ModelvllmLocal32,7684,096$0.00$0.00YesNo
41lmstudio-localLM Studio Local ModellmstudioLocal32,7684,096$0.00$0.00YesNo
42sonar-proSonar ProperplexitySmart200,0008,192$3.00$15.00NoNo
43sonarSonarperplexityBalanced128,0008,192$1.00$5.00NoNo
44command-r-plusCommand R+cohereSmart128,0004,096$2.50$10.00YesNo
45command-rCommand RcohereBalanced128,0004,096$0.15$0.60YesNo
46jamba-1.5-largeJamba 1.5 Largeai21Smart256,0004,096$2.00$8.00YesNo
47cerebras/llama3.3-70bLlama 3.3 70B (Cerebras)cerebrasBalanced128,0008,192$0.06$0.06YesNo
48cerebras/llama3.1-8bLlama 3.1 8B (Cerebras)cerebrasFast128,0008,192$0.01$0.01YesNo
49sambanova/llama-3.3-70bLlama 3.3 70B (SambaNova)sambanovaBalanced128,0008,192$0.06$0.06YesNo
50grok-2Grok 2xaiSmart131,07232,768$2.00$10.00YesYes
51grok-2-miniGrok 2 MinixaiFast131,07232,768$0.30$0.50YesNo
52hf/meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B (HF)huggingfaceBalanced128,0004,096$0.30$0.30NoNo
53replicate/meta-llama-3.3-70b-instructLlama 3.3 70B (Replicate)replicateBalanced128,0004,096$0.40$0.40NoNo

Model Tiers:

TierDescriptionTypical Use
FrontierMost capable, highest costOrchestration, architecture, security audits
SmartStrong reasoning, moderate costCoding, code review, research, analysis
BalancedGood cost/quality tradeoffPlanning, writing, DevOps, day-to-day tasks
FastCheapest cloud inferenceOps, translation, simple Q&A, health checks
LocalSelf-hosted, zero costPrivacy-first, offline, development

Notes:

  • Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with Local tier and zero cost.
  • The entries above are a representative subset of the 230+ builtin models. The full catalog includes additional models per provider and runtime auto-discovered models that vary per installation.

Model Aliases

All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.

AliasResolves To
sonnetclaude-sonnet-4-20250514
claude-sonnetclaude-sonnet-4-20250514
haikuclaude-haiku-4-5-20251001
claude-haikuclaude-haiku-4-5-20251001
opusclaude-opus-4-20250514
claude-opusclaude-opus-4-20250514
gpt4gpt-4o
gpt4ogpt-4o
gpt4-minigpt-4o-mini
flashgemini-2.5-flash
gemini-flashgemini-2.5-flash
gemini-progemini-2.5-pro
deepseekdeepseek-chat
llamallama-3.3-70b-versatile
llama-70bllama-3.3-70b-versatile
mixtralmixtral-8x7b-32768
mistralmistral-large-latest
codestralcodestral-latest
grokgrok-2
grok-minigrok-2-mini
sonarsonar-pro
jambajamba-1.5-large
command-rcommand-r-plus

You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.


Per-Agent Model Override

Each agent in your config.toml can specify its own model, overriding the global default:

# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"

# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus"                      # alias for claude-opus-4-20250514

[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile"   # cheap Groq model for simple ops

[[agents]]
name = "coder"
model = "gemini-2.5-flash"          # fast + cheap + 1M context

[[agents]]
name = "researcher"
model = "sonar-pro"                 # Perplexity with built-in web search

# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514"  # never auto-routed

When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.


Model Routing

LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.

How It Works

  1. The ModelRouter scores each incoming CompletionRequest based on heuristics
  2. The score maps to a TaskComplexity tier: Simple, Medium, or Complex
  3. Each tier has a pre-configured model

Scoring Heuristics

SignalWeightLogic
Total message length1 point per ~4 charsRough token proxy
Tool availability+20 per tool definedTools imply multi-step work
Code markers+30 per marker foundBackticks, fn, def, class, import, function, async, await, struct, impl, return
Conversation depth+15 per message > 10Deep context = harder reasoning
System prompt length+1 per 10 chars > 500Long system prompts imply complex tasks

Thresholds

ComplexityScore RangeDefault Model
Simplescore < 100claude-haiku-4-5-20251001
Medium100 <= score < 500claude-sonnet-4-20250514
Complexscore >= 500claude-sonnet-4-20250514

Configuration

Model routing is configured per-agent in the agent manifest — not as a top-level section in ~/.librefang/config.toml. Putting [routing] at the top of config.toml will be ignored (the kernel logs Unknown config field (ignored) field="routing").

# In an agent manifest file
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

The router also integrates with the model catalog:

  • validate_models() checks that all configured model IDs exist in the catalog
  • resolve_aliases() expands aliases to canonical IDs (e.g., "sonnet" becomes "claude-sonnet-4-20250514")

Cost Tracking

LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.

Per-Response Cost Estimation

After each LLM call, cost is calculated as:

cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate

The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.

Cost Rates (per million tokens)

Model PatternInput $/MOutput $/M
*haiku*$0.25$1.25
*sonnet*$3.00$15.00
*opus*$15.00$75.00
gpt-4o-mini$0.15$0.60
gpt-4o$2.50$10.00
gpt-4.1-nano$0.10$0.40
gpt-4.1-mini$0.40$1.60
gpt-4.1$2.00$8.00
o3-mini$1.10$4.40
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.15$0.60
gemini-2.0-flash$0.10$0.40
deepseek-reasoner / deepseek-r1$0.55$2.19
*deepseek*$0.27$1.10
*cerebras*$0.06$0.06
*sambanova*$0.06$0.06
*replicate*$0.40$0.40
*llama* / *mixtral*$0.05$0.10
*qwen*$0.20$0.60
mistral-large*$2.00$6.00
*mistral* (other)$0.10$0.30
command-r-plus$2.50$10.00
command-r$0.15$0.60
sonar-pro$3.00$15.00
*sonar* (other)$1.00$5.00
grok-2-mini / grok-mini$0.30$0.50
*grok* (other)$2.00$10.00
*jamba*$2.00$8.00
Default (unknown)$1.00$3.00

Quota Enforcement

Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.

# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00   # cap at $5/hour

The usage footer (when enabled) appends cost information to each response:

> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514

Fallback Providers

The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.

Behavior

  • On success: returns immediately
  • On rate limit / overload errors (429, 529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff)
  • On all other errors: logs a warning and tries the next driver in the chain
  • If all drivers fail: returns the last error

Configuration

Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.

# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]

The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).


API Endpoints

List All Models

GET /api/models

Returns the complete model catalog with metadata, pricing, and feature flags.

Response:

[
  {
    "id": "claude-sonnet-4-20250514",
    "display_name": "Claude Sonnet 4",
    "provider": "anthropic",
    "tier": "Smart",
    "context_window": 200000,
    "max_output_tokens": 64000,
    "input_cost_per_m": 3.0,
    "output_cost_per_m": 15.0,
    "supports_tools": true,
    "supports_vision": true,
    "supports_streaming": true,
    "aliases": ["sonnet", "claude-sonnet"]
  }
]

Get Specific Model

GET /api/models/{id}

Returns a single model entry. Supports both canonical IDs and aliases.

GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514

List Aliases

GET /api/models/aliases

Returns a map of all alias-to-canonical-ID mappings.

Response:

{
  "sonnet": "claude-sonnet-4-20250514",
  "haiku": "claude-haiku-4-5-20251001",
  "flash": "gemini-2.5-flash",
  "grok": "grok-2"
}

List Providers

GET /api/providers

Returns all 49 providers with auth status and model counts.

Response:

[
  {
    "id": "anthropic",
    "display_name": "Anthropic",
    "api_key_env": "ANTHROPIC_API_KEY",
    "base_url": "https://api.anthropic.com",
    "key_required": true,
    "auth_status": "Configured",
    "model_count": 3
  },
  {
    "id": "ollama",
    "display_name": "Ollama",
    "api_key_env": "OLLAMA_API_KEY",
    "base_url": "http://localhost:11434/v1",
    "key_required": false,
    "auth_status": "NotRequired",
    "model_count": 5
  }
]

Auth status values: Configured, Missing, NotRequired.

Set Provider API Key

POST /api/providers/{name}/key
Content-Type: application/json

{ "api_key": "sk-..." }

Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).

Remove Provider API Key

DELETE /api/providers/{name}/key

Removes the configured API key for a provider.

Test Provider Connection

POST /api/providers/{name}/test

Sends a minimal test request to verify the provider is reachable and the API key is valid.


Channel Commands

Two chat commands are available in any channel for inspecting models and providers:

/models

Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).

/models

Example output:

Available models (12):

Frontier:
  claude-opus-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-pro (Google Gemini) — 1M ctx

Smart:
  claude-sonnet-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-flash (Google Gemini) — 1M ctx
  deepseek-chat (DeepSeek) — 64K ctx

Balanced:
  llama-3.3-70b-versatile (Groq) — 128K ctx

Fast:
  claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
  gemini-2.0-flash (Google Gemini) — 1M ctx

Local:
  llama3.2 (Ollama) — 128K ctx

/providers

Lists all 49 providers with their authentication status.

/providers

Example output:

LLM Providers (49):

  Anthropic          ANTHROPIC_API_KEY       Configured    3 models
  OpenAI             OPENAI_API_KEY          Missing       6 models
  Google Gemini      GEMINI_API_KEY          Configured    3 models
  DeepSeek           DEEPSEEK_API_KEY        Missing       2 models
  Groq               GROQ_API_KEY            Configured    4 models
  Ollama             (no key needed)         Ready         3 models
  vLLM               (no key needed)         Ready         1 model
  LM Studio          (no key needed)         Ready         1 model
  ...

Environment Variables Summary

Quick reference for all provider environment variables:

ProviderEnv VarRequired
AnthropicANTHROPIC_API_KEYYes
OpenAIOPENAI_API_KEYYes
Google GeminiGEMINI_API_KEY or GOOGLE_API_KEYYes
DeepSeekDEEPSEEK_API_KEYYes
GroqGROQ_API_KEYYes
OpenRouterOPENROUTER_API_KEYYes
Mistral AIMISTRAL_API_KEYYes
Together AITOGETHER_API_KEYYes
Fireworks AIFIREWORKS_API_KEYYes
OllamaOLLAMA_API_KEYNo
vLLMVLLM_API_KEYNo
LM StudioLMSTUDIO_API_KEYNo
Perplexity AIPERPLEXITY_API_KEYYes
CohereCOHERE_API_KEYYes
AI21 LabsAI21_API_KEYYes
CerebrasCEREBRAS_API_KEYYes
SambaNovaSAMBANOVA_API_KEYYes
Hugging FaceHF_API_KEYYes
xAIXAI_API_KEYYes
ReplicateREPLICATE_API_TOKENYes
Claude CodeANTHROPIC_API_KEYYes
NVIDIA NIMNVIDIA_API_KEYYes
Voyage AIVOYAGE_API_KEYYes
AnyscaleANYSCALE_API_KEYYes
DeepInfraDEEPINFRA_API_KEYYes
Azure OpenAIAZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINTYes
Amazon BedrockAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGIONYes
Google Vertex AIGOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATIONYes

Security Notes

  • All API keys are stored as Zeroizing<String> -- the key material is automatically overwritten with zeros when the value is dropped from memory.
  • Auth detection (detect_auth()) only checks std::env::var() for presence -- it never reads or logs the actual secret value.
  • Provider API keys set via the REST API (POST /api/providers/{name}/key) follow the same zeroization policy.
  • The health endpoint (/api/health) never exposes provider auth status or API keys. Detailed info is behind /api/health/detail which requires authentication.
  • All DriverConfig and KernelConfig structs implement Debug with secret redaction -- API keys are printed as "***" in logs.

Provider Fallback Chain

Source: librefang-llm-drivers/src/drivers/fallback_chain.rs

A FallbackChain lets you configure an ordered list of LLM providers that LibreFang tries in sequence. When a provider fails, the runtime classifies the failure and decides whether to advance to the next provider or surface the error immediately.

Failover Reasons

LibreFang classifies each failure before deciding whether to fall forward:

FailoverReasonDescriptionFalls over?
RateLimitHTTP 429 or x-ratelimit-remaining: 0Yes — try next provider
TimeoutRequest exceeded the configured deadlineYes
ServerErrorHTTP 5xx responseYes
ModelNotFoundHTTP 404 for the requested model IDYes
AuthErrorHTTP 401 / 403No — misconfigured key, stop immediately
UnknownAny other errorYes

AuthError deliberately does not fall forward — a missing or revoked key requires operator attention and silently retrying with another provider can mask the problem.

Configuration

Define the chain under [[providers.fallback_chain]] in ~/.librefang/config.toml:

[[providers.fallback_chain]]
name = "anthropic"
model = "claude-sonnet-4-5"

[[providers.fallback_chain]]
name = "openai"
model = "gpt-4o"

[[providers.fallback_chain]]
name = "openai-compatible"
base_url = "http://localhost:11434/v1"
model = "llama3.1:8b"

LibreFang works through the list in order. If all providers fail for eligible reasons, the final error from the last provider is returned to the caller.

Interaction with Credential Pool

Each entry in the fallback chain can itself have multiple keys managed by a Credential Pool. Rate-limit exhaustion on one key causes the pool to rotate to the next key before the chain advances to the next provider.


Credential Pool

Source: librefang-llm-drivers/src/credential_pool.rs

A CredentialPool manages multiple API keys for a single provider and rotates between them automatically. This extends effective rate limits without manual intervention and provides continuity when individual keys are temporarily exhausted.

Rotation Strategies

StrategyBehavior
FillFirstAlways use the first non-exhausted key. Keys are tried in order; exhausted keys are skipped.
RoundRobinRotate through keys in a fixed cycle, skipping exhausted ones.
RandomPick a random non-exhausted key on each request.
LeastUsedPick the key with the fewest total requests, skipping exhausted ones.

The default strategy is RoundRobin.

Configuration

[providers.anthropic]
rotation_strategy = "round_robin"  # fill_first | round_robin | random | least_used

[[providers.anthropic.keys]]
key = "sk-ant-key-1"

[[providers.anthropic.keys]]
key = "sk-ant-key-2"

[[providers.anthropic.keys]]
key = "sk-ant-key-3"

Exhaustion and Cooldown

A key is marked exhausted when the response headers report zero remaining quota (x-ratelimit-remaining-requests: 0 or x-ratelimit-remaining-tokens: 0). Exhausted keys are skipped by all rotation strategies.

After 1 hour, an exhausted key automatically re-enters the active pool. This cooldown is not configurable in the current release; it corresponds to the standard window used by most OpenAI-compatible providers.

If all keys in the pool are simultaneously exhausted, the CredentialPool returns a RateLimit error, which triggers fallback to the next provider in the chain (if configured).

Usage Tracking

Each key in the pool tracks:

  • Total request count
  • Total token count (prompt + completion)
  • Exhaustion timestamp (if currently exhausted)

These counters are in-memory only and reset when the daemon restarts. Persistent usage tracking is available through the budget ledger (/api/providers/{name}/usage).


Rate Limit Tracker

Source: librefang-llm-drivers/src/rate_limit_tracker.rs

After every LLM API response, LibreFang parses the provider's rate-limit headers into a RateLimitSnapshot. This snapshot is used by the credential pool to detect exhaustion and is surfaced in the dashboard and CLI for operator visibility.

Parsed Headers

LibreFang recognizes 21 header variants across major providers:

HeaderMeaning
x-ratelimit-limit-requestsPer-window request ceiling
x-ratelimit-remaining-requestsRequests remaining in current window
x-ratelimit-reset-requestsTime until request quota resets
x-ratelimit-limit-tokensPer-window token ceiling
x-ratelimit-remaining-tokensTokens remaining in current window
x-ratelimit-reset-tokensTime until token quota resets
ratelimit-limitGeneric combined limit (some providers)
ratelimit-remainingGeneric combined remaining
ratelimit-resetGeneric reset time
retry-afterSeconds until retry is safe (429 responses)
x-rate-limit-limit-requestsAlternate spelling variant
x-rate-limit-remaining-requestsAlternate spelling variant
… and 9 more provider-specific variants

ASCII Progress Bar

The CLI and dashboard display a compact progress bar for each active provider:

anthropic  [████████░░░░░░░░░░░░]  req: 847/1000  tok: 42.3K/90K
openai     [██████████████░░░░░░]  req: 712/1000  tok: 61.1K/90K

The bar width is 20 characters. Each filled block () represents 5% of the limit consumed. An empty bar means no quota data is available from the provider.

Accessing the Snapshot

# Current rate-limit snapshot for all providers
curl http://127.0.0.1:4545/api/providers/rate-limits
{
  "anthropic": {
    "requests": { "limit": 1000, "remaining": 153, "reset_in_seconds": 34 },
    "tokens":   { "limit": 90000, "remaining": 47700, "reset_in_seconds": 34 }
  }
}

Fields are null when the provider did not return the corresponding header.