Local & Self-Hosted Providers
This page covers providers that run on your machine, inside your network, or behind a self-hosted inference endpoint that LibreFang can reach directly.
Included Providers
- Ollama
- vLLM
- LM Studio
Ollama
| Display Name | Ollama |
| Driver | OpenAI-compatible |
| Env Var | OLLAMA_API_KEY (not required) |
| Base URL | http://localhost:11434/v1 |
| Key Required | No |
| Free Tier | Free (local) |
| Auth | None (local) |
| Models | 3 builtin + auto-discovered |
Available Models (builtin):
llama3.2(Local)mistral:latest(Local)phi3(Local)
Setup:
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3.2 - Start the server:
ollama serve - No env var needed -- Ollama is always available
Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.
vLLM
| Display Name | vLLM |
| Driver | OpenAI-compatible |
| Env Var | VLLM_API_KEY (not required) |
| Base URL | http://localhost:8000/v1 |
| Key Required | No |
| Free Tier | Free (self-hosted) |
| Auth | None (local) |
| Models | 1 builtin + auto-discovered |
Available Models (builtin):
vllm-local(Local)
Setup:
- Install vLLM:
pip install vllm - Start the server:
python -m vllm.entrypoints.openai.api_server --model <model-name> - No env var needed
LM Studio
| Display Name | LM Studio |
| Driver | OpenAI-compatible |
| Env Var | LMSTUDIO_API_KEY (not required) |
| Base URL | http://localhost:1234/v1 |
| Key Required | No |
| Free Tier | Free (local) |
| Auth | None (local) |
| Models | 1 builtin + auto-discovered |
Available Models (builtin):
lmstudio-local(Local)
Setup:
- Download LM Studio from lmstudio.ai
- Download a model from the built-in model browser
- Start the local server from the "Local Server" tab
- No env var needed
Embedding auto-detection
When the embeddings driver is set to provider = "auto", LibreFang walks a priority list to pick the first reachable embeddings backend. Cloud API keys win first; local servers come after. The current order (verified against librefang-runtime::embedding::detect_embedding_provider):
OPENAI_API_KEY→"openai"OPENROUTER_API_KEY→"openrouter"MISTRAL_API_KEY→"mistral"TOGETHER_API_KEY→"together"FIREWORKS_API_KEY→"fireworks"COHERE_API_KEY→"cohere"OLLAMA_HOST(local)VLLM_BASE_URL(local)LMSTUDIO_BASE_URL(local)
Setting any of OLLAMA_HOST / VLLM_BASE_URL / LMSTUDIO_BASE_URL to a reachable URL is enough — no API key needed for the local entries. The same priority is applied by create_embedding_driver, so what auto picks at startup matches what an explicit provider = "vllm" would build.
GROQ_API_KEY is deliberately not in the ladder — Groq exposes no /v1/embeddings endpoint, so auto-picking it would produce silent 404s at the first real call. Likewise, providers without a wired embedding driver (VOYAGE, JINA, MIXEDBREAD, BEDROCK for auto-pick — Bedrock works when explicitly configured) are not auto-detected even if their chat keys are set.
Capability detection (Ollama metadata pipeline)
For Ollama, LibreFang probes /api/tags at startup and on the periodic provider refresh, then derives each model's capabilities from the metadata it returns:
| Capability | How it's inferred |
|---|---|
vision | Model card declares an image-input modality, or the family is in the known-vision list (llava, bakllava, clip, …) |
embedding | Model card declares an embedding-only architecture (nomic-embed-text, bge-m3, mxbai-embed-large, …) |
tools | Model card declares tool-use support (newer Llama / Qwen / Mistral instruct families) |
parameter_size | Lifted verbatim from the Ollama tag (e.g. 7.6B, 14.8B) |
quantization_level | Lifted verbatim (e.g. Q4_K_M, F16) |
The dashboard's "Models" tab and the model-switcher dropdown show these badges per model, and they feed into the model_catalog's Modality enum (Text / Image / Audio) — a caller asking for image-generation models filters cleanly without a hard-coded list.
vLLM and LM Studio receive the same probe via their OpenAI-compat /models endpoint, but the Ollama path populates the most fields because the upstream API exposes the richest data.
Reprobe cadence
Local providers are probed at startup and re-probed every 60 seconds in the background — so a model you ollama pull while the daemon is running shows up in the catalog without a daemon restart, and a stopped local server flips to auth_status: "local_offline" automatically. The dashboard's "Test connection" button on a provider card forces an immediate re-probe + cache refresh so you don't have to wait for the next periodic tick.