Platforms & Managed Endpoints

This page covers providers that route through a platform-specific gateway, enterprise cloud integration, managed inference fabric, or regional model marketplace rather than a single first-party vendor API.

Included Providers

Replicate
NVIDIA NIM
DeepInfra
Azure OpenAI
GitHub Models (Azure AI Inference)
Qwen (DashScope)
MiniMax
Qianfan (Baidu)
VolcEngine (Doubao)
BytePlus ModelArk
Zhipu (GLM)
Zhipu Coding (CodeGeex)
Z.ai
Vertex AI

Note on Bedrock: AWS Bedrock is documented in Hosted APIs since the current driver authenticates via a long-lived bearer token (AWS_BEARER_TOKEN_BEDROCK) rather than per-request SigV4 signing. The older SigV4 entry that used to live on this page has been removed.

Replicate


Display Name	Replicate
Driver	OpenAI-compatible
Env Var	`REPLICATE_API_TOKEN`
Base URL	`https://api.replicate.com/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header
Models	1

Available Models:

replicate/meta-llama-3.3-70b-instruct (Balanced)

Setup:

Sign up at replicate.com
Go to Account > API Tokens
export REPLICATE_API_TOKEN="r8_..."

NVIDIA NIM


Display Name	NVIDIA NIM
Driver	OpenAI-compatible
Env Var	`NVIDIA_API_KEY`
Base URL	`https://integrate.api.nvidia.com/v1`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	Llama, Mistral, and NVIDIA-optimized models

Setup:

Sign up at build.nvidia.com
Create an API key
export NVIDIA_API_KEY="nvapi-..."

DeepInfra


Display Name	DeepInfra
Driver	OpenAI-compatible
Env Var	`DEEPINFRA_API_KEY`
Base URL	`https://api.deepinfra.com/v1/openai`
Key Required	Yes
Free Tier	Yes (limited credits)
Auth	`Authorization: Bearer` header
Models	Open-source models at low cost

Setup:

Sign up at deepinfra.com
Create an API key
export DEEPINFRA_API_KEY="..."

Azure OpenAI


Display Name	Azure OpenAI
Driver	OpenAI-compatible
Env Var	`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`
Base URL	`https://<your-resource>.openai.azure.com/openai/deployments/<deployment>`
Key Required	Yes
Free Tier	No
Auth	`api-key` header
Models	GPT-4o, GPT-4, and other Azure-hosted models

Setup:

Create an Azure OpenAI resource in the Azure Portal
Deploy a model in Azure OpenAI Studio

Set environment variables:

export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com"

GitHub Models (Azure AI Inference)


Display Name	Microsoft
Provider ID	`microsoft` (alias: `github-models`)
Driver	OpenAI-compatible
Env Var	`GITHUB_MODELS_TOKEN`
Base URL	`https://models.inference.ai.azure.com`
Key Required	Yes
Free Tier	Yes (rate-limited per GitHub plan)
Auth	`Authorization: Bearer` header

Setup:

Sign in to github.com/marketplace/models
Generate a fine-grained Personal Access Token (no repository scopes are required for the Models endpoint)
export GITHUB_MODELS_TOKEN="ghp_..." (the same PAT works as GITHUB_TOKEN for github-copilot, but the env vars are kept separate so configuring one product doesn't auto-activate the other)

Minimal config.toml:

[default_model]
provider = "microsoft"
model = "phi-4"

Notes: Despite the microsoft provider id, this is the GitHub Models / Azure AI Inference endpoint, distinct from azure-openai (which addresses your own Azure OpenAI deployments via AZURE_OPENAI_ENDPOINT) and from github-copilot (which is the IDE-side Copilot subscription). Models are catalog-driven and include third-party hosts like meta-llama-3-70b-instruct, mistral-large, phi-4, etc.

Qwen (DashScope)


Display Name	Qwen
Driver	OpenAI-compatible
Env Var	`DASHSCOPE_API_KEY`
Base URL	`https://dashscope.aliyuncs.com/compatible-mode/v1`
Aliases	`dashscope`, `model_studio`
Key Required	Yes
Free Tier	Yes (limited credits on signup)
Auth	`Authorization: Bearer` header

Regions:

Region	Endpoint	API Key Env
(default)	`dashscope.aliyuncs.com`	`DASHSCOPE_API_KEY`
`intl`	`dashscope-intl.aliyuncs.com`	`DASHSCOPE_API_KEY`
`us`	`dashscope-us.aliyuncs.com`	`DASHSCOPE_API_KEY`

Setup:

Sign up at DashScope Console
Create an API key
export DASHSCOPE_API_KEY="sk-..."

Optionally select a region in config.toml:

[provider_regions]
qwen = "intl"    # or "us"

Notes: Qwen uses Alibaba Cloud's DashScope API. The default endpoint serves mainland China; use the intl or us region for lower latency outside China. Models are defined in the registry TOML and loaded at boot.

MiniMax


Display Name	MiniMax
Driver	OpenAI-compatible
Env Var	`MINIMAX_API_KEY`
Base URL	`https://api.minimax.io/v1`
Key Required	Yes
Free Tier	No
Auth	`Authorization: Bearer` header

Regions:

Region	Endpoint	API Key Env
(default)	`api.minimax.io`	`MINIMAX_API_KEY`
`china`	`api.minimaxi.com`	`MINIMAX_CN_API_KEY`

Setup:

Sign up at minimax.io (international) or minimaxi.com (China)
Create an API key
export MINIMAX_API_KEY="..."

For China region:

[provider_regions]
minimax = "china"

export MINIMAX_CN_API_KEY="..."

Media Generation: In addition to LLM chat models, MiniMax provides media generation capabilities when used with the Creator Hand or media API endpoints:

Modality	Model	Description
Image	`image-01`	Text-to-image generation
TTS	`speech-2.8-hd`	High-quality text-to-speech
Video	`T2V-01`	Text-to-video generation (async)
Music	`music-2.5`	Music generation with optional lyrics

These are accessed via /api/media/* endpoints or through the image_generate, video_generate, music_generate, text_to_speech tools.

Notes: MiniMax international (minimax.io) and China (minimaxi.com) use separate API keys. When selecting the china region, LibreFang automatically reads from MINIMAX_CN_API_KEY instead of MINIMAX_API_KEY.

Qianfan (Baidu)


Display Name	Qianfan
Provider ID	`qianfan` (alias: `baidu`)
Driver	OpenAI-compatible
Env Var	`QIANFAN_API_KEY`
Base URL	`https://qianfan.baidubce.com/v2`
Key Required	Yes
Free Tier	Limited credits on signup
Auth	`Authorization: Bearer` header

Setup:

Sign up at the Baidu Qianfan console
Create an API key under the IAM panel
export QIANFAN_API_KEY="..."

Minimal config.toml:

[default_model]
provider = "qianfan"
model = "ernie-4.0-8k"

Notes: Qianfan is Baidu's LLM platform; the ernie-4.0, ernie-4.0-turbo, and ernie-speed families are exposed via the OpenAI-compatible v2 endpoint. Tools and function calling are supported on ernie-4.0+.

VolcEngine (Doubao)


Display Name	VolcEngine / Doubao
Provider ID	`volcengine` (alias: `doubao`)
Driver	OpenAI-compatible
Env Var	`VOLCENGINE_API_KEY`
Base URL	`https://ark.cn-beijing.volces.com/api/v3`
Key Required	Yes
Free Tier	Free credits on signup
Auth	`Authorization: Bearer` header

Setup:

Sign up at the VolcEngine console
Create an inference endpoint in Ark > Endpoints and note its model ID (ep-...)
Generate an API key under Ark > API Keys
export VOLCENGINE_API_KEY="..."

Minimal config.toml:

[default_model]
provider = "volcengine"
model = "ep-20240101120000-abcde"   # endpoint ID, not the model name

Notes: Doubao is ByteDance's LLM product, exposed via VolcEngine's Ark inference platform. Unlike most providers, the model field must be the endpoint ID (ep-...), not the model family name. Vision and tools are supported on Doubao 1.5 / 1.6 endpoints. The Coding Plan endpoint (https://ark.cn-beijing.volces.com/api/coding/v3) is registered as the separate provider volcengine_coding, which uses its own VOLCENGINE_CODING_API_KEY — set the same key value under both env vars if you use both endpoints.

BytePlus ModelArk

International edition of VolcEngine. Same Ark platform, different region (ap-southeast) and pricing in USD. The two endpoints below have separate env vars (BYTEPLUS_API_KEY for byteplus, BYTEPLUS_CODING_API_KEY for byteplus_coding) — the same BytePlus account key works for both, you just export it under each name you actually use.

`byteplus` — standard `/api/v3` endpoint


Display Name	BytePlus ModelArk
Provider ID	`byteplus`
Driver	OpenAI-compatible
Env Var	`BYTEPLUS_API_KEY`
Base URL	`https://ark.ap-southeast.bytepluses.com/api/v3`
Key Required	Yes
Auth	`Authorization: Bearer` header

[default_model]
provider = "byteplus"
model = "seed-2-0-pro-260328"       # versioned snapshot, see byteplus.toml

`byteplus_coding` — Anthropic-compatible coding endpoint


Display Name	BytePlus Coding Plan
Provider ID	`byteplus_coding`
Driver	Anthropic-compatible (`/v1/messages`)
Env Var	`BYTEPLUS_CODING_API_KEY`
Base URL	`https://ark.ap-southeast.bytepluses.com/api/coding`
Auth	`x-api-key` + `anthropic-version` headers

[default_model]
provider = "byteplus_coding"
model = "ark-code-latest"           # auto-routed friendly alias

Setup:

Sign up at the BytePlus console
Activate the models you want under ModelArk > Model Inventory
Generate an API key under ModelArk > API Keys
export BYTEPLUS_API_KEY="..." (for the standard byteplus endpoint)
export BYTEPLUS_CODING_API_KEY="..." (for the byteplus_coding endpoint — same key value, separate env)

Notes: byteplus uses versioned snapshot IDs like seed-2-0-pro-260328. byteplus_coding uses friendly aliases like ark-code-latest (auto-router), dola-seed-2.0-pro, kimi-k2.5, glm-4.7, gpt-oss-120b — these stay stable across model upgrades.

Zhipu (GLM)


Display Name	Zhipu GLM
Provider ID	`zhipu` (alias: `glm`)
Driver	OpenAI-compatible
Env Var	`ZHIPU_API_KEY`
Base URL	`https://open.bigmodel.cn/api/paas/v4`
Key Required	Yes
Free Tier	Yes (per-month free quota)
Auth	`Authorization: Bearer` header

Setup:

Sign up at the Zhipu open platform
Generate an API key under Account > API Keys
export ZHIPU_API_KEY="..."

Minimal config.toml:

[default_model]
provider = "zhipu"
model = "glm-4-plus"

Notes: Zhipu develops the GLM (General Language Model) family. Tools, vision, and embeddings supported. z.ai (below) is the international-region front-end of the same backend but is registered as its own provider with its own ZAI_API_KEY. The Coding Plan endpoints (zhipu_coding, zai_coding) similarly use their own ZHIPU_CODING_API_KEY and ZAI_CODING_API_KEY. Register once at Zhipu and re-export the same key value under each env var you actually use.

Zhipu Coding (CodeGeex)


Display Name	CodeGeex
Provider ID	`zhipu-coding` (alias: `codegeex`)
Driver	OpenAI-compatible
Env Var	`ZHIPU_CODING_API_KEY`
Base URL	`https://open.bigmodel.cn/api/coding/paas/v4`
Key Required	Yes
Free Tier	Yes (counted against the same Zhipu quota)
Auth	`Authorization: Bearer` header

Setup:

Use the same Zhipu API key from above (register once on the Zhipu open platform)
export ZHIPU_CODING_API_KEY="..." (same value as ZHIPU_API_KEY, separate env so configuring Zhipu chat doesn't auto-activate this endpoint)

Minimal config.toml:

[default_model]
provider = "zhipu-coding"
model = "codegeex-4"

Notes: Coding-specialized endpoint serving CodeGeex models. Same auth as Zhipu/GLM but different base URL — use this provider when you want fill-in-the-middle / repo-aware coding completions instead of generic chat.

Z.ai


Display Name	Z.ai
Provider ID	`z.ai`
Driver	OpenAI-compatible
Env Var	`ZAI_API_KEY`
Base URL	`https://api.z.ai/api/paas/v4`
Key Required	Yes
Free Tier	Yes (counted against the same Zhipu quota)
Auth	`Authorization: Bearer` header

Setup:

Use the same Zhipu API key (register once on the Zhipu open platform)
export ZAI_API_KEY="..." (same value as ZHIPU_API_KEY, separate env so configuring Zhipu chat doesn't auto-activate this endpoint)

Minimal config.toml:

[default_model]
provider = "z.ai"
model = "glm-4-plus"

Notes: Branded Z.ai endpoint — same Zhipu backend, different routing for international traffic. The coding variant zai_coding (base URL https://api.z.ai/api/coding/paas/v4) is registered as its own provider; it uses ZAI_CODING_API_KEY (re-export the same Zhipu key value).

Vertex AI


Display Name	Google Vertex AI
Driver	Native Gemini (generateContent API via Vertex)
Config Section	`[vertex_ai]`
Env Var	`GOOGLE_APPLICATION_CREDENTIALS`, `VERTEX_PROJECT`, `VERTEX_LOCATION`
Base URL	`https://<location>-aiplatform.googleapis.com`
Key Required	Yes (service account JSON or gcloud CLI)
Free Tier	No
Auth	OAuth2 service account or `gcloud auth print-access-token`
Models	Gemini models via Google Cloud Vertex AI enterprise endpoint

Setup:

Enable Vertex AI API in Google Cloud Console
Either create a service account key file, or authenticate with gcloud auth application-default login

Set environment variables:

# Option A: Service account key file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export VERTEX_PROJECT="your-gcp-project"
export VERTEX_LOCATION="us-central1"

# Option B: gcloud CLI (no key file needed)
gcloud auth application-default login
export VERTEX_PROJECT="your-gcp-project"
export VERTEX_LOCATION="us-central1"

Configure in config.toml:

[vertex_ai]
project = "your-gcp-project"
location = "us-central1"

Notes: Vertex AI uses the same Gemini generateContent API format as the native Gemini driver but authenticates via Google Cloud OAuth2 instead of an API key. Access tokens are cached with a ~50-minute TTL and auto-refreshed before expiry. The endpoint format is https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent.

Platforms & Managed Endpoints

byteplus — standard /api/v3 endpoint

byteplus_coding — Anthropic-compatible coding endpoint

`byteplus` — standard `/api/v3` endpoint

`byteplus_coding` — Anthropic-compatible coding endpoint