Platforms & Managed Endpoints
This page covers providers that route through a platform-specific gateway, enterprise cloud integration, managed inference fabric, or regional model marketplace rather than a single first-party vendor API.
Included Providers
- Replicate
- NVIDIA NIM
- DeepInfra
- Azure OpenAI
- GitHub Models (Azure AI Inference)
- Qwen (DashScope)
- MiniMax
- Qianfan (Baidu)
- VolcEngine (Doubao)
- BytePlus ModelArk
- Zhipu (GLM)
- Zhipu Coding (CodeGeex)
- Z.ai
- Vertex AI
Note on Bedrock: AWS Bedrock is documented in Hosted APIs since the current driver authenticates via a long-lived bearer token (
AWS_BEARER_TOKEN_BEDROCK) rather than per-request SigV4 signing. The older SigV4 entry that used to live on this page has been removed.
Replicate
| Display Name | Replicate |
| Driver | OpenAI-compatible |
| Env Var | REPLICATE_API_TOKEN |
| Base URL | https://api.replicate.com/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
| Models | 1 |
Available Models:
replicate/meta-llama-3.3-70b-instruct(Balanced)
Setup:
- Sign up at replicate.com
- Go to Account > API Tokens
export REPLICATE_API_TOKEN="r8_..."
NVIDIA NIM
| Display Name | NVIDIA NIM |
| Driver | OpenAI-compatible |
| Env Var | NVIDIA_API_KEY |
| Base URL | https://integrate.api.nvidia.com/v1 |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | Llama, Mistral, and NVIDIA-optimized models |
Setup:
- Sign up at build.nvidia.com
- Create an API key
export NVIDIA_API_KEY="nvapi-..."
DeepInfra
| Display Name | DeepInfra |
| Driver | OpenAI-compatible |
| Env Var | DEEPINFRA_API_KEY |
| Base URL | https://api.deepinfra.com/v1/openai |
| Key Required | Yes |
| Free Tier | Yes (limited credits) |
| Auth | Authorization: Bearer header |
| Models | Open-source models at low cost |
Setup:
- Sign up at deepinfra.com
- Create an API key
export DEEPINFRA_API_KEY="..."
Azure OpenAI
| Display Name | Azure OpenAI |
| Driver | OpenAI-compatible |
| Env Var | AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT |
| Base URL | https://<your-resource>.openai.azure.com/openai/deployments/<deployment> |
| Key Required | Yes |
| Free Tier | No |
| Auth | api-key header |
| Models | GPT-4o, GPT-4, and other Azure-hosted models |
Setup:
- Create an Azure OpenAI resource in the Azure Portal
- Deploy a model in Azure OpenAI Studio
- Set environment variables:
export AZURE_OPENAI_API_KEY="..." export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com"
GitHub Models (Azure AI Inference)
| Display Name | Microsoft |
| Provider ID | microsoft (alias: github-models) |
| Driver | OpenAI-compatible |
| Env Var | GITHUB_MODELS_TOKEN |
| Base URL | https://models.inference.ai.azure.com |
| Key Required | Yes |
| Free Tier | Yes (rate-limited per GitHub plan) |
| Auth | Authorization: Bearer header |
Setup:
- Sign in to github.com/marketplace/models
- Generate a fine-grained Personal Access Token (no repository scopes are required for the Models endpoint)
export GITHUB_MODELS_TOKEN="ghp_..."(the same PAT works asGITHUB_TOKENforgithub-copilot, but the env vars are kept separate so configuring one product doesn't auto-activate the other)
Minimal config.toml:
[default_model]
provider = "microsoft"
model = "phi-4"
Notes: Despite the microsoft provider id, this is the GitHub Models / Azure AI Inference endpoint, distinct from azure-openai (which addresses your own Azure OpenAI deployments via AZURE_OPENAI_ENDPOINT) and from github-copilot (which is the IDE-side Copilot subscription). Models are catalog-driven and include third-party hosts like meta-llama-3-70b-instruct, mistral-large, phi-4, etc.
Qwen (DashScope)
| Display Name | Qwen |
| Driver | OpenAI-compatible |
| Env Var | DASHSCOPE_API_KEY |
| Base URL | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| Aliases | dashscope, model_studio |
| Key Required | Yes |
| Free Tier | Yes (limited credits on signup) |
| Auth | Authorization: Bearer header |
Regions:
| Region | Endpoint | API Key Env |
|---|---|---|
| (default) | dashscope.aliyuncs.com | DASHSCOPE_API_KEY |
intl | dashscope-intl.aliyuncs.com | DASHSCOPE_API_KEY |
us | dashscope-us.aliyuncs.com | DASHSCOPE_API_KEY |
Setup:
- Sign up at DashScope Console
- Create an API key
export DASHSCOPE_API_KEY="sk-..."- Optionally select a region in
config.toml:[provider_regions] qwen = "intl" # or "us"
Notes: Qwen uses Alibaba Cloud's DashScope API. The default endpoint serves mainland China; use the intl or us region for lower latency outside China. Models are defined in the registry TOML and loaded at boot.
MiniMax
| Display Name | MiniMax |
| Driver | OpenAI-compatible |
| Env Var | MINIMAX_API_KEY |
| Base URL | https://api.minimax.io/v1 |
| Key Required | Yes |
| Free Tier | No |
| Auth | Authorization: Bearer header |
Regions:
| Region | Endpoint | API Key Env |
|---|---|---|
| (default) | api.minimax.io | MINIMAX_API_KEY |
china | api.minimaxi.com | MINIMAX_CN_API_KEY |
Setup:
- Sign up at minimax.io (international) or minimaxi.com (China)
- Create an API key
export MINIMAX_API_KEY="..."- For China region:
[provider_regions] minimax = "china"export MINIMAX_CN_API_KEY="..."
Media Generation: In addition to LLM chat models, MiniMax provides media generation capabilities when used with the Creator Hand or media API endpoints:
| Modality | Model | Description |
|---|---|---|
| Image | image-01 | Text-to-image generation |
| TTS | speech-2.8-hd | High-quality text-to-speech |
| Video | T2V-01 | Text-to-video generation (async) |
| Music | music-2.5 | Music generation with optional lyrics |
These are accessed via /api/media/* endpoints or through the image_generate, video_generate, music_generate, text_to_speech tools.
Notes: MiniMax international (minimax.io) and China (minimaxi.com) use separate API keys. When selecting the china region, LibreFang automatically reads from MINIMAX_CN_API_KEY instead of MINIMAX_API_KEY.
Qianfan (Baidu)
| Display Name | Qianfan |
| Provider ID | qianfan (alias: baidu) |
| Driver | OpenAI-compatible |
| Env Var | QIANFAN_API_KEY |
| Base URL | https://qianfan.baidubce.com/v2 |
| Key Required | Yes |
| Free Tier | Limited credits on signup |
| Auth | Authorization: Bearer header |
Setup:
- Sign up at the Baidu Qianfan console
- Create an API key under the IAM panel
export QIANFAN_API_KEY="..."
Minimal config.toml:
[default_model]
provider = "qianfan"
model = "ernie-4.0-8k"
Notes: Qianfan is Baidu's LLM platform; the ernie-4.0, ernie-4.0-turbo, and ernie-speed families are exposed via the OpenAI-compatible v2 endpoint. Tools and function calling are supported on ernie-4.0+.
VolcEngine (Doubao)
| Display Name | VolcEngine / Doubao |
| Provider ID | volcengine (alias: doubao) |
| Driver | OpenAI-compatible |
| Env Var | VOLCENGINE_API_KEY |
| Base URL | https://ark.cn-beijing.volces.com/api/v3 |
| Key Required | Yes |
| Free Tier | Free credits on signup |
| Auth | Authorization: Bearer header |
Setup:
- Sign up at the VolcEngine console
- Create an inference endpoint in Ark > Endpoints and note its model ID (
ep-...) - Generate an API key under Ark > API Keys
export VOLCENGINE_API_KEY="..."
Minimal config.toml:
[default_model]
provider = "volcengine"
model = "ep-20240101120000-abcde" # endpoint ID, not the model name
Notes: Doubao is ByteDance's LLM product, exposed via VolcEngine's Ark inference platform. Unlike most providers, the model field must be the endpoint ID (ep-...), not the model family name. Vision and tools are supported on Doubao 1.5 / 1.6 endpoints. The Coding Plan endpoint (https://ark.cn-beijing.volces.com/api/coding/v3) is registered as the separate provider volcengine_coding, which uses its own VOLCENGINE_CODING_API_KEY — set the same key value under both env vars if you use both endpoints.
BytePlus ModelArk
International edition of VolcEngine. Same Ark platform, different region (ap-southeast) and pricing in USD. The two endpoints below have separate env vars (BYTEPLUS_API_KEY for byteplus, BYTEPLUS_CODING_API_KEY for byteplus_coding) — the same BytePlus account key works for both, you just export it under each name you actually use.
byteplus — standard /api/v3 endpoint
| Display Name | BytePlus ModelArk |
| Provider ID | byteplus |
| Driver | OpenAI-compatible |
| Env Var | BYTEPLUS_API_KEY |
| Base URL | https://ark.ap-southeast.bytepluses.com/api/v3 |
| Key Required | Yes |
| Auth | Authorization: Bearer header |
[default_model]
provider = "byteplus"
model = "seed-2-0-pro-260328" # versioned snapshot, see byteplus.toml
byteplus_coding — Anthropic-compatible coding endpoint
| Display Name | BytePlus Coding Plan |
| Provider ID | byteplus_coding |
| Driver | Anthropic-compatible (/v1/messages) |
| Env Var | BYTEPLUS_CODING_API_KEY |
| Base URL | https://ark.ap-southeast.bytepluses.com/api/coding |
| Auth | x-api-key + anthropic-version headers |
[default_model]
provider = "byteplus_coding"
model = "ark-code-latest" # auto-routed friendly alias
Setup:
- Sign up at the BytePlus console
- Activate the models you want under ModelArk > Model Inventory
- Generate an API key under ModelArk > API Keys
export BYTEPLUS_API_KEY="..."(for the standardbyteplusendpoint)export BYTEPLUS_CODING_API_KEY="..."(for thebyteplus_codingendpoint — same key value, separate env)
Notes: byteplus uses versioned snapshot IDs like seed-2-0-pro-260328. byteplus_coding uses friendly aliases like ark-code-latest (auto-router), dola-seed-2.0-pro, kimi-k2.5, glm-4.7, gpt-oss-120b — these stay stable across model upgrades.
Zhipu (GLM)
| Display Name | Zhipu GLM |
| Provider ID | zhipu (alias: glm) |
| Driver | OpenAI-compatible |
| Env Var | ZHIPU_API_KEY |
| Base URL | https://open.bigmodel.cn/api/paas/v4 |
| Key Required | Yes |
| Free Tier | Yes (per-month free quota) |
| Auth | Authorization: Bearer header |
Setup:
- Sign up at the Zhipu open platform
- Generate an API key under Account > API Keys
export ZHIPU_API_KEY="..."
Minimal config.toml:
[default_model]
provider = "zhipu"
model = "glm-4-plus"
Notes: Zhipu develops the GLM (General Language Model) family. Tools, vision, and embeddings supported. z.ai (below) is the international-region front-end of the same backend but is registered as its own provider with its own ZAI_API_KEY. The Coding Plan endpoints (zhipu_coding, zai_coding) similarly use their own ZHIPU_CODING_API_KEY and ZAI_CODING_API_KEY. Register once at Zhipu and re-export the same key value under each env var you actually use.
Zhipu Coding (CodeGeex)
| Display Name | CodeGeex |
| Provider ID | zhipu-coding (alias: codegeex) |
| Driver | OpenAI-compatible |
| Env Var | ZHIPU_CODING_API_KEY |
| Base URL | https://open.bigmodel.cn/api/coding/paas/v4 |
| Key Required | Yes |
| Free Tier | Yes (counted against the same Zhipu quota) |
| Auth | Authorization: Bearer header |
Setup:
- Use the same Zhipu API key from above (register once on the Zhipu open platform)
export ZHIPU_CODING_API_KEY="..."(same value asZHIPU_API_KEY, separate env so configuring Zhipu chat doesn't auto-activate this endpoint)
Minimal config.toml:
[default_model]
provider = "zhipu-coding"
model = "codegeex-4"
Notes: Coding-specialized endpoint serving CodeGeex models. Same auth as Zhipu/GLM but different base URL — use this provider when you want fill-in-the-middle / repo-aware coding completions instead of generic chat.
Z.ai
| Display Name | Z.ai |
| Provider ID | z.ai |
| Driver | OpenAI-compatible |
| Env Var | ZAI_API_KEY |
| Base URL | https://api.z.ai/api/paas/v4 |
| Key Required | Yes |
| Free Tier | Yes (counted against the same Zhipu quota) |
| Auth | Authorization: Bearer header |
Setup:
- Use the same Zhipu API key (register once on the Zhipu open platform)
export ZAI_API_KEY="..."(same value asZHIPU_API_KEY, separate env so configuring Zhipu chat doesn't auto-activate this endpoint)
Minimal config.toml:
[default_model]
provider = "z.ai"
model = "glm-4-plus"
Notes: Branded Z.ai endpoint — same Zhipu backend, different routing for international traffic. The coding variant zai_coding (base URL https://api.z.ai/api/coding/paas/v4) is registered as its own provider; it uses ZAI_CODING_API_KEY (re-export the same Zhipu key value).
Vertex AI
| Display Name | Google Vertex AI |
| Driver | Native Gemini (generateContent API via Vertex) |
| Config Section | [vertex_ai] |
| Env Var | GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATION |
| Base URL | https://<location>-aiplatform.googleapis.com |
| Key Required | Yes (service account JSON or gcloud CLI) |
| Free Tier | No |
| Auth | OAuth2 service account or gcloud auth print-access-token |
| Models | Gemini models via Google Cloud Vertex AI enterprise endpoint |
Setup:
- Enable Vertex AI API in Google Cloud Console
- Either create a service account key file, or authenticate with
gcloud auth application-default login - Set environment variables:
# Option A: Service account key file export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" export VERTEX_PROJECT="your-gcp-project" export VERTEX_LOCATION="us-central1" # Option B: gcloud CLI (no key file needed) gcloud auth application-default login export VERTEX_PROJECT="your-gcp-project" export VERTEX_LOCATION="us-central1" - Configure in
config.toml:[vertex_ai] project = "your-gcp-project" location = "us-central1"
Notes: Vertex AI uses the same Gemini generateContent API format as the native Gemini driver but authenticates via Google Cloud OAuth2 instead of an API key. Access tokens are cached with a ~50-minute TTL and auto-refreshed before expiry. The endpoint format is https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent.