Platforms & Managed Endpoints

This page covers providers that route through a platform-specific gateway, enterprise cloud integration, managed inference fabric, or regional model marketplace rather than a single first-party vendor API.

Included Providers

  • Replicate
  • NVIDIA NIM
  • DeepInfra
  • Azure OpenAI
  • GitHub Models (Azure AI Inference)
  • Qwen (DashScope)
  • MiniMax
  • Qianfan (Baidu)
  • VolcEngine (Doubao)
  • BytePlus ModelArk
  • Zhipu (GLM)
  • Zhipu Coding (CodeGeex)
  • Z.ai
  • Vertex AI

Note on Bedrock: AWS Bedrock is documented in Hosted APIs since the current driver authenticates via a long-lived bearer token (AWS_BEARER_TOKEN_BEDROCK) rather than per-request SigV4 signing. The older SigV4 entry that used to live on this page has been removed.


Replicate

Display NameReplicate
DriverOpenAI-compatible
Env VarREPLICATE_API_TOKEN
Base URLhttps://api.replicate.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models1

Available Models:

  • replicate/meta-llama-3.3-70b-instruct (Balanced)

Setup:

  1. Sign up at replicate.com
  2. Go to Account > API Tokens
  3. export REPLICATE_API_TOKEN="r8_..."

NVIDIA NIM

Display NameNVIDIA NIM
DriverOpenAI-compatible
Env VarNVIDIA_API_KEY
Base URLhttps://integrate.api.nvidia.com/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
ModelsLlama, Mistral, and NVIDIA-optimized models

Setup:

  1. Sign up at build.nvidia.com
  2. Create an API key
  3. export NVIDIA_API_KEY="nvapi-..."

DeepInfra

Display NameDeepInfra
DriverOpenAI-compatible
Env VarDEEPINFRA_API_KEY
Base URLhttps://api.deepinfra.com/v1/openai
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
ModelsOpen-source models at low cost

Setup:

  1. Sign up at deepinfra.com
  2. Create an API key
  3. export DEEPINFRA_API_KEY="..."

Azure OpenAI

Display NameAzure OpenAI
DriverOpenAI-compatible
Env VarAZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT
Base URLhttps://<your-resource>.openai.azure.com/openai/deployments/<deployment>
Key RequiredYes
Free TierNo
Authapi-key header
ModelsGPT-4o, GPT-4, and other Azure-hosted models

Setup:

  1. Create an Azure OpenAI resource in the Azure Portal
  2. Deploy a model in Azure OpenAI Studio
  3. Set environment variables:
    export AZURE_OPENAI_API_KEY="..."
    export AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com"
    

GitHub Models (Azure AI Inference)

Display NameMicrosoft
Provider IDmicrosoft (alias: github-models)
DriverOpenAI-compatible
Env VarGITHUB_MODELS_TOKEN
Base URLhttps://models.inference.ai.azure.com
Key RequiredYes
Free TierYes (rate-limited per GitHub plan)
AuthAuthorization: Bearer header

Setup:

  1. Sign in to github.com/marketplace/models
  2. Generate a fine-grained Personal Access Token (no repository scopes are required for the Models endpoint)
  3. export GITHUB_MODELS_TOKEN="ghp_..." (the same PAT works as GITHUB_TOKEN for github-copilot, but the env vars are kept separate so configuring one product doesn't auto-activate the other)

Minimal config.toml:

[default_model]
provider = "microsoft"
model = "phi-4"

Notes: Despite the microsoft provider id, this is the GitHub Models / Azure AI Inference endpoint, distinct from azure-openai (which addresses your own Azure OpenAI deployments via AZURE_OPENAI_ENDPOINT) and from github-copilot (which is the IDE-side Copilot subscription). Models are catalog-driven and include third-party hosts like meta-llama-3-70b-instruct, mistral-large, phi-4, etc.


Qwen (DashScope)

Display NameQwen
DriverOpenAI-compatible
Env VarDASHSCOPE_API_KEY
Base URLhttps://dashscope.aliyuncs.com/compatible-mode/v1
Aliasesdashscope, model_studio
Key RequiredYes
Free TierYes (limited credits on signup)
AuthAuthorization: Bearer header

Regions:

RegionEndpointAPI Key Env
(default)dashscope.aliyuncs.comDASHSCOPE_API_KEY
intldashscope-intl.aliyuncs.comDASHSCOPE_API_KEY
usdashscope-us.aliyuncs.comDASHSCOPE_API_KEY

Setup:

  1. Sign up at DashScope Console
  2. Create an API key
  3. export DASHSCOPE_API_KEY="sk-..."
  4. Optionally select a region in config.toml:
    [provider_regions]
    qwen = "intl"    # or "us"
    

Notes: Qwen uses Alibaba Cloud's DashScope API. The default endpoint serves mainland China; use the intl or us region for lower latency outside China. Models are defined in the registry TOML and loaded at boot.


MiniMax

Display NameMiniMax
DriverOpenAI-compatible
Env VarMINIMAX_API_KEY
Base URLhttps://api.minimax.io/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header

Regions:

RegionEndpointAPI Key Env
(default)api.minimax.ioMINIMAX_API_KEY
chinaapi.minimaxi.comMINIMAX_CN_API_KEY

Setup:

  1. Sign up at minimax.io (international) or minimaxi.com (China)
  2. Create an API key
  3. export MINIMAX_API_KEY="..."
  4. For China region:
    [provider_regions]
    minimax = "china"
    
    export MINIMAX_CN_API_KEY="..."
    

Media Generation: In addition to LLM chat models, MiniMax provides media generation capabilities when used with the Creator Hand or media API endpoints:

ModalityModelDescription
Imageimage-01Text-to-image generation
TTSspeech-2.8-hdHigh-quality text-to-speech
VideoT2V-01Text-to-video generation (async)
Musicmusic-2.5Music generation with optional lyrics

These are accessed via /api/media/* endpoints or through the image_generate, video_generate, music_generate, text_to_speech tools.

Notes: MiniMax international (minimax.io) and China (minimaxi.com) use separate API keys. When selecting the china region, LibreFang automatically reads from MINIMAX_CN_API_KEY instead of MINIMAX_API_KEY.


Qianfan (Baidu)

Display NameQianfan
Provider IDqianfan (alias: baidu)
DriverOpenAI-compatible
Env VarQIANFAN_API_KEY
Base URLhttps://qianfan.baidubce.com/v2
Key RequiredYes
Free TierLimited credits on signup
AuthAuthorization: Bearer header

Setup:

  1. Sign up at the Baidu Qianfan console
  2. Create an API key under the IAM panel
  3. export QIANFAN_API_KEY="..."

Minimal config.toml:

[default_model]
provider = "qianfan"
model = "ernie-4.0-8k"

Notes: Qianfan is Baidu's LLM platform; the ernie-4.0, ernie-4.0-turbo, and ernie-speed families are exposed via the OpenAI-compatible v2 endpoint. Tools and function calling are supported on ernie-4.0+.


VolcEngine (Doubao)

Display NameVolcEngine / Doubao
Provider IDvolcengine (alias: doubao)
DriverOpenAI-compatible
Env VarVOLCENGINE_API_KEY
Base URLhttps://ark.cn-beijing.volces.com/api/v3
Key RequiredYes
Free TierFree credits on signup
AuthAuthorization: Bearer header

Setup:

  1. Sign up at the VolcEngine console
  2. Create an inference endpoint in Ark > Endpoints and note its model ID (ep-...)
  3. Generate an API key under Ark > API Keys
  4. export VOLCENGINE_API_KEY="..."

Minimal config.toml:

[default_model]
provider = "volcengine"
model = "ep-20240101120000-abcde"   # endpoint ID, not the model name

Notes: Doubao is ByteDance's LLM product, exposed via VolcEngine's Ark inference platform. Unlike most providers, the model field must be the endpoint ID (ep-...), not the model family name. Vision and tools are supported on Doubao 1.5 / 1.6 endpoints. The Coding Plan endpoint (https://ark.cn-beijing.volces.com/api/coding/v3) is registered as the separate provider volcengine_coding, which uses its own VOLCENGINE_CODING_API_KEY — set the same key value under both env vars if you use both endpoints.


BytePlus ModelArk

International edition of VolcEngine. Same Ark platform, different region (ap-southeast) and pricing in USD. The two endpoints below have separate env vars (BYTEPLUS_API_KEY for byteplus, BYTEPLUS_CODING_API_KEY for byteplus_coding) — the same BytePlus account key works for both, you just export it under each name you actually use.

byteplus — standard /api/v3 endpoint

Display NameBytePlus ModelArk
Provider IDbyteplus
DriverOpenAI-compatible
Env VarBYTEPLUS_API_KEY
Base URLhttps://ark.ap-southeast.bytepluses.com/api/v3
Key RequiredYes
AuthAuthorization: Bearer header
[default_model]
provider = "byteplus"
model = "seed-2-0-pro-260328"       # versioned snapshot, see byteplus.toml

byteplus_coding — Anthropic-compatible coding endpoint

Display NameBytePlus Coding Plan
Provider IDbyteplus_coding
DriverAnthropic-compatible (/v1/messages)
Env VarBYTEPLUS_CODING_API_KEY
Base URLhttps://ark.ap-southeast.bytepluses.com/api/coding
Authx-api-key + anthropic-version headers
[default_model]
provider = "byteplus_coding"
model = "ark-code-latest"           # auto-routed friendly alias

Setup:

  1. Sign up at the BytePlus console
  2. Activate the models you want under ModelArk > Model Inventory
  3. Generate an API key under ModelArk > API Keys
  4. export BYTEPLUS_API_KEY="..." (for the standard byteplus endpoint)
  5. export BYTEPLUS_CODING_API_KEY="..." (for the byteplus_coding endpoint — same key value, separate env)

Notes: byteplus uses versioned snapshot IDs like seed-2-0-pro-260328. byteplus_coding uses friendly aliases like ark-code-latest (auto-router), dola-seed-2.0-pro, kimi-k2.5, glm-4.7, gpt-oss-120b — these stay stable across model upgrades.


Zhipu (GLM)

Display NameZhipu GLM
Provider IDzhipu (alias: glm)
DriverOpenAI-compatible
Env VarZHIPU_API_KEY
Base URLhttps://open.bigmodel.cn/api/paas/v4
Key RequiredYes
Free TierYes (per-month free quota)
AuthAuthorization: Bearer header

Setup:

  1. Sign up at the Zhipu open platform
  2. Generate an API key under Account > API Keys
  3. export ZHIPU_API_KEY="..."

Minimal config.toml:

[default_model]
provider = "zhipu"
model = "glm-4-plus"

Notes: Zhipu develops the GLM (General Language Model) family. Tools, vision, and embeddings supported. z.ai (below) is the international-region front-end of the same backend but is registered as its own provider with its own ZAI_API_KEY. The Coding Plan endpoints (zhipu_coding, zai_coding) similarly use their own ZHIPU_CODING_API_KEY and ZAI_CODING_API_KEY. Register once at Zhipu and re-export the same key value under each env var you actually use.


Zhipu Coding (CodeGeex)

Display NameCodeGeex
Provider IDzhipu-coding (alias: codegeex)
DriverOpenAI-compatible
Env VarZHIPU_CODING_API_KEY
Base URLhttps://open.bigmodel.cn/api/coding/paas/v4
Key RequiredYes
Free TierYes (counted against the same Zhipu quota)
AuthAuthorization: Bearer header

Setup:

  1. Use the same Zhipu API key from above (register once on the Zhipu open platform)
  2. export ZHIPU_CODING_API_KEY="..." (same value as ZHIPU_API_KEY, separate env so configuring Zhipu chat doesn't auto-activate this endpoint)

Minimal config.toml:

[default_model]
provider = "zhipu-coding"
model = "codegeex-4"

Notes: Coding-specialized endpoint serving CodeGeex models. Same auth as Zhipu/GLM but different base URL — use this provider when you want fill-in-the-middle / repo-aware coding completions instead of generic chat.


Z.ai

Display NameZ.ai
Provider IDz.ai
DriverOpenAI-compatible
Env VarZAI_API_KEY
Base URLhttps://api.z.ai/api/paas/v4
Key RequiredYes
Free TierYes (counted against the same Zhipu quota)
AuthAuthorization: Bearer header

Setup:

  1. Use the same Zhipu API key (register once on the Zhipu open platform)
  2. export ZAI_API_KEY="..." (same value as ZHIPU_API_KEY, separate env so configuring Zhipu chat doesn't auto-activate this endpoint)

Minimal config.toml:

[default_model]
provider = "z.ai"
model = "glm-4-plus"

Notes: Branded Z.ai endpoint — same Zhipu backend, different routing for international traffic. The coding variant zai_coding (base URL https://api.z.ai/api/coding/paas/v4) is registered as its own provider; it uses ZAI_CODING_API_KEY (re-export the same Zhipu key value).


Vertex AI

Display NameGoogle Vertex AI
DriverNative Gemini (generateContent API via Vertex)
Config Section[vertex_ai]
Env VarGOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT, VERTEX_LOCATION
Base URLhttps://<location>-aiplatform.googleapis.com
Key RequiredYes (service account JSON or gcloud CLI)
Free TierNo
AuthOAuth2 service account or gcloud auth print-access-token
ModelsGemini models via Google Cloud Vertex AI enterprise endpoint

Setup:

  1. Enable Vertex AI API in Google Cloud Console
  2. Either create a service account key file, or authenticate with gcloud auth application-default login
  3. Set environment variables:
    # Option A: Service account key file
    export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
    export VERTEX_PROJECT="your-gcp-project"
    export VERTEX_LOCATION="us-central1"
    
    # Option B: gcloud CLI (no key file needed)
    gcloud auth application-default login
    export VERTEX_PROJECT="your-gcp-project"
    export VERTEX_LOCATION="us-central1"
    
  4. Configure in config.toml:
    [vertex_ai]
    project = "your-gcp-project"
    location = "us-central1"
    

Notes: Vertex AI uses the same Gemini generateContent API format as the native Gemini driver but authenticates via Google Cloud OAuth2 instead of an API key. Access tokens are cached with a ~50-minute TTL and auto-refreshed before expiry. The endpoint format is https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent.