Local & Self-Hosted Providers

This page covers providers that run on your machine, inside your network, or behind a self-hosted inference endpoint that LibreFang can reach directly.

Included Providers

  • Ollama
  • vLLM
  • LM Studio

Ollama

Display NameOllama
DriverOpenAI-compatible
Env VarOLLAMA_API_KEY (not required)
Base URLhttp://localhost:11434/v1
Key RequiredNo
Free TierFree (local)
AuthNone (local)
Models3 builtin + auto-discovered

Available Models (builtin):

  • llama3.2 (Local)
  • mistral:latest (Local)
  • phi3 (Local)

Setup:

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Start the server: ollama serve
  4. No env var needed -- Ollama is always available

Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.


vLLM

Display NamevLLM
DriverOpenAI-compatible
Env VarVLLM_API_KEY (not required)
Base URLhttp://localhost:8000/v1
Key RequiredNo
Free TierFree (self-hosted)
AuthNone (local)
Models1 builtin + auto-discovered

Available Models (builtin):

  • vllm-local (Local)

Setup:

  1. Install vLLM: pip install vllm
  2. Start the server: python -m vllm.entrypoints.openai.api_server --model <model-name>
  3. No env var needed

LM Studio

Display NameLM Studio
DriverOpenAI-compatible
Env VarLMSTUDIO_API_KEY (not required)
Base URLhttp://localhost:1234/v1
Key RequiredNo
Free TierFree (local)
AuthNone (local)
Models1 builtin + auto-discovered

Available Models (builtin):

  • lmstudio-local (Local)

Setup:

  1. Download LM Studio from lmstudio.ai
  2. Download a model from the built-in model browser
  3. Start the local server from the "Local Server" tab
  4. No env var needed

Embedding auto-detection

When the embeddings driver is set to provider = "auto", LibreFang walks a priority list to pick the first reachable embeddings backend. Cloud API keys win first; local servers come after. The current order (verified against librefang-runtime::embedding::detect_embedding_provider):

  1. OPENAI_API_KEY"openai"
  2. OPENROUTER_API_KEY"openrouter"
  3. MISTRAL_API_KEY"mistral"
  4. TOGETHER_API_KEY"together"
  5. FIREWORKS_API_KEY"fireworks"
  6. COHERE_API_KEY"cohere"
  7. OLLAMA_HOST (local)
  8. VLLM_BASE_URL (local)
  9. LMSTUDIO_BASE_URL (local)

Setting any of OLLAMA_HOST / VLLM_BASE_URL / LMSTUDIO_BASE_URL to a reachable URL is enough — no API key needed for the local entries. The same priority is applied by create_embedding_driver, so what auto picks at startup matches what an explicit provider = "vllm" would build.

GROQ_API_KEY is deliberately not in the ladder — Groq exposes no /v1/embeddings endpoint, so auto-picking it would produce silent 404s at the first real call. Likewise, providers without a wired embedding driver (VOYAGE, JINA, MIXEDBREAD, BEDROCK for auto-pick — Bedrock works when explicitly configured) are not auto-detected even if their chat keys are set.


Capability detection (Ollama metadata pipeline)

For Ollama, LibreFang probes /api/tags at startup and on the periodic provider refresh, then derives each model's capabilities from the metadata it returns:

CapabilityHow it's inferred
visionModel card declares an image-input modality, or the family is in the known-vision list (llava, bakllava, clip, …)
embeddingModel card declares an embedding-only architecture (nomic-embed-text, bge-m3, mxbai-embed-large, …)
toolsModel card declares tool-use support (newer Llama / Qwen / Mistral instruct families)
parameter_sizeLifted verbatim from the Ollama tag (e.g. 7.6B, 14.8B)
quantization_levelLifted verbatim (e.g. Q4_K_M, F16)

The dashboard's "Models" tab and the model-switcher dropdown show these badges per model, and they feed into the model_catalog's Modality enum (Text / Image / Audio) — a caller asking for image-generation models filters cleanly without a hard-coded list.

vLLM and LM Studio receive the same probe via their OpenAI-compat /models endpoint, but the Ollama path populates the most fields because the upstream API exposes the richest data.

Reprobe cadence

Local providers are probed at startup and re-probed every 60 seconds in the background — so a model you ollama pull while the daemon is running shows up in the catalog without a daemon restart, and a stopped local server flips to auth_status: "local_offline" automatically. The dashboard's "Test connection" button on a provider card forces an immediate re-probe + cache refresh so you don't have to wait for the next periodic tick.