Local & Self-Hosted Providers

This page covers providers that run on your machine, inside your network, or behind a self-hosted inference endpoint that LibreFang can reach directly.

Ollama


Display Name	Ollama
Driver	OpenAI-compatible
Env Var	`OLLAMA_API_KEY` (not required)
Base URL	`http://localhost:11434/v1`
Key Required	No
Free Tier	Free (local)
Auth	None (local)
Models	3 builtin + auto-discovered

Available Models (builtin):

llama3.2 (Local)
mistral:latest (Local)
phi3 (Local)

Setup:

Install Ollama from ollama.com
Pull a model: ollama pull llama3.2
Start the server: ollama serve
No env var needed -- Ollama is always available

Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.

vLLM


Display Name	vLLM
Driver	OpenAI-compatible
Env Var	`VLLM_API_KEY` (not required)
Base URL	`http://localhost:8000/v1`
Key Required	No
Free Tier	Free (self-hosted)
Auth	None (local)
Models	1 builtin + auto-discovered

Available Models (builtin):

vllm-local (Local)

Setup:

Install vLLM: pip install vllm
Start the server: python -m vllm.entrypoints.openai.api_server --model <model-name>
No env var needed

LM Studio


Display Name	LM Studio
Driver	OpenAI-compatible
Env Var	`LMSTUDIO_API_KEY` (not required)
Base URL	`http://localhost:1234/v1`
Key Required	No
Free Tier	Free (local)
Auth	None (local)
Models	1 builtin + auto-discovered

Available Models (builtin):

lmstudio-local (Local)

Setup:

Download LM Studio from lmstudio.ai
Download a model from the built-in model browser
Start the local server from the "Local Server" tab
No env var needed

When the embeddings driver is set to provider = "auto", LibreFang walks a priority list to pick the first reachable embeddings backend. Cloud API keys win first; local servers come after. The current order (verified against librefang-runtime::embedding::detect_embedding_provider):

OPENAI_API_KEY → "openai"
OPENROUTER_API_KEY → "openrouter"
MISTRAL_API_KEY → "mistral"
TOGETHER_API_KEY → "together"
FIREWORKS_API_KEY → "fireworks"
COHERE_API_KEY → "cohere"
OLLAMA_HOST (local)
VLLM_BASE_URL (local)
LMSTUDIO_BASE_URL (local)

Setting any of OLLAMA_HOST / VLLM_BASE_URL / LMSTUDIO_BASE_URL to a reachable URL is enough — no API key needed for the local entries. The same priority is applied by create_embedding_driver, so what auto picks at startup matches what an explicit provider = "vllm" would build.

GROQ_API_KEY is deliberately not in the ladder — Groq exposes no /v1/embeddings endpoint, so auto-picking it would produce silent 404s at the first real call. Likewise, providers without a wired embedding driver (VOYAGE, JINA, MIXEDBREAD, BEDROCK for auto-pick — Bedrock works when explicitly configured) are not auto-detected even if their chat keys are set.

Capability detection (Ollama metadata pipeline)

For Ollama, LibreFang probes /api/tags at startup and on the periodic provider refresh, then derives each model's capabilities from the metadata it returns:

Capability	How it's inferred
`vision`	Model card declares an image-input modality, or the family is in the known-vision list (`llava`, `bakllava`, `clip`, …)
`embedding`	Model card declares an embedding-only architecture (`nomic-embed-text`, `bge-m3`, `mxbai-embed-large`, …)
`tools`	Model card declares tool-use support (newer Llama / Qwen / Mistral instruct families)
`parameter_size`	Lifted verbatim from the Ollama tag (e.g. `7.6B`, `14.8B`)
`quantization_level`	Lifted verbatim (e.g. `Q4_K_M`, `F16`)

The dashboard's "Models" tab and the model-switcher dropdown show these badges per model, and they feed into the model_catalog's Modality enum (Text / Image / Audio) — a caller asking for image-generation models filters cleanly without a hard-coded list.

vLLM and LM Studio receive the same probe via their OpenAI-compat /models endpoint, but the Ollama path populates the most fields because the upstream API exposes the richest data.

Reprobe cadence

Local providers are probed at startup and re-probed every 60 seconds in the background — so a model you ollama pull while the daemon is running shows up in the catalog without a daemon restart, and a stopped local server flips to auth_status: "local_offline" automatically. The dashboard's "Test connection" button on a provider card forces an immediate re-probe + cache refresh so you don't have to wait for the next periodic tick.

Local & Self-Hosted Providers

Included Providers

Ollama

vLLM

LM Studio

Embedding auto-detection

Capability detection (Ollama metadata pipeline)

Reprobe cadence